NRT readers and overall indexing/querying throughput

Alexander Lukyanchikov Tue, 03 Aug 2021 19:43:22 -0700

Hello everyone,

We are considering switching from regular to NRT readers, hoping it would
improve overall indexing/querying throughput and also optimize the
turnaround time.
I did some benchmarks, mostly to understand how much benefit we can get and
make sure I'm implementing everything correctly.


To my surprise, no matter how I tweak it, our indexing throughput is 10%
lower with NRT, and query throughput (goes in parallel with indexing) is
pretty much the same. I do see almost x5 turnaround time improvement though.
Maybe I have wrong expectations, and less frequent commits with NRT refresh
were not intended to improve overall performance?

Some details about the tests -
Base implementation commits and refreshes a regular reader every second.
NRT implementation commits every 60 seconds and refreshes NRT reader every
second.
The indexing rate is about 23 Mb/sec, query rate ~300 rps (text search with
avg 50ms latency). Documents size is about 35 Kb.
36 core machine is used for the tests, and I don't see a big difference in
JVM metrics between the tests. Also, there is no obvious bottleneck in
CPU/memory/disk utilization (everything is way below 100%)
NRT readers are implemented using the SearchManager, the same as the
implementation
in the Lucene benchmark
<https://github.com/mikemccand/luceneutil/blob/master/src/main/perf/NRTPerfTest.java>
 repository.
With NRT, commit latency is about 3 sec, average refresh latency is 150ms.
In the base approach, commit latency is about 500 ms, refresh 300 ms.
I tried NRTCachingDirectory (with MmapDirectory and NIOFSDirectory), insert
vs update workload, `applyAllDeletes=false`, single indexing thread -
nothing helps to match the base version throughput.

I'd appreciate any advice. Am I missing something obvious, or the
expectation that NRT with less frequent commits going to be more
performant/resource-efficient is incorrect?

--
Regards,
Alex

NRT readers and overall indexing/querying throughput

Reply via email to