RS146BIJAY commented on issue #13387:
URL: https://github.com/apache/lucene/issues/13387#issuecomment-2470112408

   On some more analysis figured out an approach which addresses all the above 
comments and obtain same improvement with different IndexWriter for different 
group as we got with using different DWPTs for different group.
   
   ## Using separate IndexWriter for maintaining different tenants with a 
combined view
   
   ### Current Issue
   
   Maintaining separate IndexWriter for different groups (tenant) presents a 
significant problem as they do not function as a single unified entity. 
Although distinct IndexWriters and directories for each group ensures that data 
belonging to different groups are kept in separate segments and segments within 
the same group are merged, a unified read-only view for Client (OpenSearch) to 
interact with these multiple group-level IndexWriters is still needed.
   
   Lucene’s addIndexes api offers a way to combine group-level IndexWriters 
into single parent-level IndexWriter, but this approach has multiple drawbacks:
   
   1. Since writes may continue on group-level IndexWriters, periodic 
synchronisation with the parent-level IndexWriter is necessary.
   2. During synchronisation, an external lock needs to be placed on the group 
level IndexWriter directory, causing downtime.
   3. Synchronisation will also involve copying files from the group level 
IndexWriter directory to the parent IndexWriter directory, which is 
resource-intensive, consuming disk IO and CPU cycles.
   
   ### Proposal
   
   To address this issue, we propose introducing a mechanism that combines 
group-level IndexWriters as a soft reference to a parent IndexWriter. This will 
be achieved by creating a new variant of the addIndexes API within IndexWriter, 
which will only combine the SegmentInfos of group-level IndexWriter without 
requiring an external lock or copying files across directories. Group-level 
segments will be maintained in separate directories associated with their 
respective group-level IndexWriters.
   
   The client will periodically call (for OpenSearch side this corresponds to 
index refresh interval of 1 sec) this addIndexes API on the parent IndexWriter, 
passing the segmentInfos of child-level IndexWriter as parameters to sync the 
latest SegmentInfos with the parent IndexWriter. While combining the 
SegmentInfos of child-level IndexWriters, the addIndexes API will attach a 
prefix to the segment names to identify the group each Segments belongs to, 
avoiding name conflicts between segments of different group-level IndexWriters.
   
   ![compositeIndexWriter drawio 
(1)](https://github.com/user-attachments/assets/8ddd8568-a352-41ac-bc42-ce3cb4647f8f)
   
   The parent IndexWriter will be associated with a filter directory that will 
distinguishes the tenant using the file name prefix, redirecting any read/write 
operations on a file to the correct group level directory using segment file 
prefix name.
   
   #### Reason for choosing common view as an IndexWriter
   
   Most interactions of Lucene with the client (OpenSearch) such as opening a 
reader, getting the latest commit info, reopening a Lucene index, etc occurs 
via IndexWriter itself. Thus selecting IndexWriter as a common view made more 
sense.
   
   ### Improvements with multiple IndexWriter with a combined view
   
   We were able to observe around 50% - 60% improvements with multiple 
IndexWriter with a combined view approach similar to what we observed by having 
different DWPTs for different tenant (initial proposal).
   
   ### Considerations
   
   1. The referencing IndexWriter will be a combined read only view for group 
level IndexWriters. Since this IndexWriter does not itself has any segments and 
is only referencing segment Infos of other IndexWriters, write operation like 
segment merge, flush etc should not be performed on this parent IndexWriter 
instance.
   2. We need to consider prefix name attached before segment names when 
[parsing segment 
names](https://github.com/RS146BIJAY/lucene/blob/84811e974f38181b0c1f1e1b5655f674a1584385/lucene/core/src/java/org/apache/lucene/index/IndexFileNames.java#L119).
   3. It will be difficult to support update queries with multi IndexWriter 
approach. For eg: If we are grouping logs on status code and user update the 
status code field of the logs, for lucene, insert and update operations needs 
to be performed on the separate delete queue.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to