RS146BIJAY commented on issue #13387:
URL: https://github.com/apache/lucene/issues/13387#issuecomment-2360645812

   ## Approach 2: Using a physical directory for each group
   
   
![approach2](https://github.com/user-attachments/assets/223686c4-5c0c-49c1-b54c-1aee22a2d1bf)
   
   To segregate segments belonging to different groups and avoid attaching a 
prefix to segment names, we associated group-level IndexWriters with a physical 
directory instead of a filter directory. CompositeIndexWriter are linked to the 
top-level multi-tenant directory while group-level IndexWriters are connected 
to individual directories specific to each group within the parent directory. 
Since segments belonging to each groups are now in separate directory, there is 
no need to prefix segment names, thus solving the prefix name issue with above 
approach. Separate IndexWriter ensures only segments belonging to same group 
are merged together.
   
   ### Pros
   
   1. Having a different directories for each group’s IndexWriter reduces the 
chances of failing any Lucene’s internal calls.
   
   ### Cons
   
   1. Multiple IndexWriters still do not function as a single entity when 
interacting with the client (OpenSearch). Each IndexWriter has its own 
associated SegmentInfos, Index commit, SegmentInfos generation and version. 
This breaks multiple features like segment replication and it’s derivative 
remote store. For example, in a remote store enabled cluster, we maintain a 
replica of the shard (single Lucene index) on separate remote storage (such as 
S3).  To achieve this, during each checkpoint, we take a snapshot of the 
current generation of SegmentInfos associated with the Lucene Index and upload 
the associated files along  with a metadata file  (associated with a generation 
of SegmentInfo) to a remote store. Now with multiple IndexWriter for the same 
shard, a list of SegmentInfos (one for each group) will be associated. We can 
handle this by creating a list of snapshots and their separate metadata files, 
but this essentially translates to maintaining separate Lucene indexes for eac
 h shard, essentially making each segment group becoming a shard on the client 
(OpenSearch) end.
   2. In order to address the above issue, we can try creating a common wrapper 
for the list of SegmentInfos, similar to what we did for IndexWriters with 
CompositeIndexWriter. However, this approach also has issues, as the common 
wrapper would need a common generation and version. Additionally, it should be 
possible to associate the common wrapper with a specific index commit to allow 
opening a CompositeIndexWriter at a specific Index commit point. Furthermore, 
when a CompositeIndexWriter is opened using a commit point, it should be 
possible to open all the group level sub IndexWriters at that Index commit 
point. While this is doable, it is extremely complex to implement it.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to