RS146BIJAY commented on issue #13387:
URL: https://github.com/apache/lucene/issues/13387#issuecomment-2360641099

   Thanks [mikemccand](https://github.com/mikemccand) and 
[vigyasharma](https://github.com/vigyasharma) for suggestions. Evaluated 
different approaches to use different IndexWriter for different groups:
   
   ## Approach 1: Using filter directory for each group
   
   
![approach1](https://github.com/user-attachments/assets/857b6bad-8e31-480a-8b3e-c9af06479b9e)
   
   In this approach, each group (for above example grouping criteria is status 
code) has its own IndexWriter, associated with distinct logical filter 
directories that attach a filename prefix to the segments according to their 
respective group (200_, 400_ etc.). These directories are backed by a single 
physical directory. Since different IndexWriter manages segments belonging to 
different groups, segments belonging to the same group are always merged 
together. A CompositeIndexWriter wraps the group-level IndexWriters for client 
(OpenSearch) interaction. While adding or updating a document, this 
CompositeIndexWriter delegates the operation to corresponding criteria specific 
IndexWriter. CompositeIndexWriter is associated with the top level physical 
directory.
   
   To address the sequence number conflict between different IndexWriters, a 
common sequence number generator was used for all IndexWriters within a shard. 
This ensures that sequence number are always continuous increasing number for 
the IndexWriters in the same shard.
   
   ### Pros
   
   1. Using separate IndexWriters for different groups ensures that documents 
from groups are categorised into distinct segments. This approach also 
eliminates the need to modify merge policy.
   2. Using a common sequence number generator prevent sequence number conflict 
among IndexWriters belonging to same group. However, since sequence number 
generation is delegated to the Client (OpenSearch), they must ensure that the 
sequence numbers are monotonically increasing.
   
   ### Cons
   
   1. Lucene internally search for files starting with segments_ or 
pending_segments_  for operations like getting last commit generation of index 
or write.lock for checking if lock is applied on directory, etc. Attaching a 
prefix name to these files may break Lucene’s internal operations.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to