RS146BIJAY commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2360645812
## Approach 2: Using a physical directory for each group ![approach2](https://github.com/user-attachments/assets/223686c4-5c0c-49c1-b54c-1aee22a2d1bf) To segregate segments belonging to different groups and avoid attaching a prefix to segment names, we associated group-level IndexWriters with a physical directory instead of a filter directory. CompositeIndexWriter are linked to the top-level multi-tenant directory while group-level IndexWriters are connected to individual directories specific to each group within the parent directory. Since segments belonging to each groups are now in separate directory, there is no need to prefix segment names, thus solving the prefix name issue with above approach. Separate IndexWriter ensures only segments belonging to same group are merged together. ### Pros 1. Having a different directories for each group’s IndexWriter reduces the chances of failing any Lucene’s internal calls. ### Cons 1. Multiple IndexWriters still do not function as a single entity when interacting with the client (OpenSearch). Each IndexWriter has its own associated SegmentInfos, Index commit, SegmentInfos generation and version. This breaks multiple features like segment replication and it’s derivative remote store. For example, in a remote store enabled cluster, we maintain a replica of the shard (single Lucene index) on separate remote storage (such as S3). To achieve this, during each checkpoint, we take a snapshot of the current generation of SegmentInfos associated with the Lucene Index and upload the associated files along with a metadata file (associated with a generation of SegmentInfo) to a remote store. Now with multiple IndexWriter for the same shard, a list of SegmentInfos (one for each group) will be associated. We can handle this by creating a list of snapshots and their separate metadata files, but this essentially translates to maintaining separate Lucene indexes for eac h shard, essentially making each segment group becoming a shard on the client (OpenSearch) end. 2. In order to address the above issue, we can try creating a common wrapper for the list of SegmentInfos, similar to what we did for IndexWriters with CompositeIndexWriter. However, this approach also has issues, as the common wrapper would need a common generation and version. Additionally, it should be possible to associate the common wrapper with a specific index commit to allow opening a CompositeIndexWriter at a specific Index commit point. Furthermore, when a CompositeIndexWriter is opened using a commit point, it should be possible to open all the group level sub IndexWriters at that Index commit point. While this is doable, it is extremely complex to implement it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org