RS146BIJAY commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2360641099
Thanks [mikemccand](https://github.com/mikemccand) and [vigyasharma](https://github.com/vigyasharma) for suggestions. Evaluated different approaches to use different IndexWriter for different groups: ## Approach 1: Using filter directory for each group ![approach1](https://github.com/user-attachments/assets/857b6bad-8e31-480a-8b3e-c9af06479b9e) In this approach, each group (for above example grouping criteria is status code) has its own IndexWriter, associated with distinct logical filter directories that attach a filename prefix to the segments according to their respective group (200_, 400_ etc.). These directories are backed by a single physical directory. Since different IndexWriter manages segments belonging to different groups, segments belonging to the same group are always merged together. A CompositeIndexWriter wraps the group-level IndexWriters for client (OpenSearch) interaction. While adding or updating a document, this CompositeIndexWriter delegates the operation to corresponding criteria specific IndexWriter. CompositeIndexWriter is associated with the top level physical directory. To address the sequence number conflict between different IndexWriters, a common sequence number generator was used for all IndexWriters within a shard. This ensures that sequence number are always continuous increasing number for the IndexWriters in the same shard. ### Pros 1. Using separate IndexWriters for different groups ensures that documents from groups are categorised into distinct segments. This approach also eliminates the need to modify merge policy. 2. Using a common sequence number generator prevent sequence number conflict among IndexWriters belonging to same group. However, since sequence number generation is delegated to the Client (OpenSearch), they must ensure that the sequence numbers are monotonically increasing. ### Cons 1. Lucene internally search for files starting with segments_ or pending_segments_ for operations like getting last commit generation of index or write.lock for checking if lock is applied on directory, etc. Attaching a prefix name to these files may break Lucene’s internal operations. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org