Hello everyone, We are considering switching from regular to NRT readers, hoping it would improve overall indexing/querying throughput and also optimize the turnaround time. I did some benchmarks, mostly to understand how much benefit we can get and make sure I'm implementing everything correctly.
To my surprise, no matter how I tweak it, our indexing throughput is 10% lower with NRT, and query throughput (goes in parallel with indexing) is pretty much the same. I do see almost x5 turnaround time improvement though. Maybe I have wrong expectations, and less frequent commits with NRT refresh were not intended to improve overall performance? Some details about the tests - Base implementation commits and refreshes a regular reader every second. NRT implementation commits every 60 seconds and refreshes NRT reader every second. The indexing rate is about 23 Mb/sec, query rate ~300 rps (text search with avg 50ms latency). Documents size is about 35 Kb. 36 core machine is used for the tests, and I don't see a big difference in JVM metrics between the tests. Also, there is no obvious bottleneck in CPU/memory/disk utilization (everything is way below 100%) NRT readers are implemented using the SearchManager, the same as the implementation in the Lucene benchmark <https://github.com/mikemccand/luceneutil/blob/master/src/main/perf/NRTPerfTest.java> repository. With NRT, commit latency is about 3 sec, average refresh latency is 150ms. In the base approach, commit latency is about 500 ms, refresh 300 ms. I tried NRTCachingDirectory (with MmapDirectory and NIOFSDirectory), insert vs update workload, `applyAllDeletes=false`, single indexing thread - nothing helps to match the base version throughput. I'd appreciate any advice. Am I missing something obvious, or the expectation that NRT with less frequent commits going to be more performant/resource-efficient is incorrect? -- Regards, Alex