rclabo commented on issue #763: URL: https://github.com/apache/lucenenet/issues/763#issuecomment-1319966984
> After Merge() operation, we immediately collect the output files and store them to cloud storage. Seems the Lucene Index is valid already and we can run query against it successfully. So, there should be no required post write operations. I'm not sure I fully understand. When you call `IndexWriter.Commit` that thread writes a new index segment to storage, handled at a top level by `DocumentsWriterPerThread` (which I had forgot was introduced in Lucene 4). That index segment on disk will be fairly small given that it had to fit in RAM in it's entirety before being written to disk. Then, in the default configuration, the [ConcurrentMergeScheduler](https://lucenenet.apache.org/docs/4.8.0-beta00016/api/core/Lucene.Net.Index.ConcurrentMergeScheduler.html) will run on background threads and merge that tiny segment with other segments to create a new larger segment. This process repeats itself over and over as new documents are written to the index and committed. [This video](https://www.youtube.com/watch?v=YW0bOvLp72E ) shows the segment writing and merging that happens as Java Lucene indexes Wikipedia. It creates a nice visual of the process we are talking about. You can see all the small initial segments being written as various commits are called and then see the segments getting combined into larger segments by the background workers. And eventually those larger segments get combined to create even larger segments and so on. So when you say "After Merge() operation, we immediately collect the output files and store them to cloud storage." That would only work well if there is no need to add additional documents to the index. Otherwise, without ongoing merging, the number of segments will grow very large and search times will become slower and slower due to the number of segments that must be searched. I hope this information is helpful. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@lucenenet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org