rclabo commented on issue #763:
URL: https://github.com/apache/lucenenet/issues/763#issuecomment-1319966984

   > After Merge() operation, we immediately collect the output files and store 
them to cloud storage. Seems the Lucene Index is valid already and we can run 
query against it successfully. So, there should be no required post write 
operations.
   
   I'm not sure I fully understand.  When you call `IndexWriter.Commit` that 
thread writes a new index segment to storage, handled at a top level by 
`DocumentsWriterPerThread` (which I had forgot was introduced in Lucene 4).  
That index segment on disk will be fairly small given that it had to fit in RAM 
in it's entirety before being written to disk.  
   
   Then, in the default configuration, the 
[ConcurrentMergeScheduler](https://lucenenet.apache.org/docs/4.8.0-beta00016/api/core/Lucene.Net.Index.ConcurrentMergeScheduler.html)
 will run on background threads and merge that tiny segment with other segments 
to create a new larger segment.  This process repeats itself over and over as 
new documents are written to the index and committed.  
   
    [This video](https://www.youtube.com/watch?v=YW0bOvLp72E ) shows  the 
segment writing and merging that happens as Java Lucene indexes Wikipedia.  It 
creates a nice visual of the process we are talking about.  You can see all the 
small initial segments being written as various commits are called and then see 
the segments getting combined into larger segments by the background workers.  
And eventually those larger segments get combined to create even larger 
segments and so on.
   
   So when you say "After Merge() operation, we immediately collect the output 
files and store them to cloud storage." That would only work well if there is 
no need to add additional documents to the index.  Otherwise, without ongoing 
merging, the number of segments will grow very large and search times will 
become slower and slower due to the number of segments that must be searched.
   
   I hope this information is helpful.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@lucenenet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to