RE: NRT readers and overall indexing/querying throughput

Uwe Schindler Sun, 08 Aug 2021 14:20:41 -0700

Hi,

in general, NRT indexing throughput is always a bit slower than a normal 
indexing as it reopens readers and needs to flush segments more often (and 
therefor you should use NRTCachingDirectory). So 10% slower indexing throughput 
is quite normal. You can improve by parallelizing, but still during a refresh 
you have a small delay on each reopen of readers by SearcherManager.


Searching is mostly same speed, because while indexing, most of the segments 
don't change and can be reused after reopen, only new but small segments are 
cold. Merged segments also need warming, so generally you only see small spikes 
in search performance when new merged and possibly huge "cold" segments get 
live.

Of course, if you use more parallel threads during indexing you will also see a 
slowdown in search performance.

When doing NRT always use NRTCachingDirectory, for "normal bulk indexing", 
MMapDirectory alone is fine.

I don't fully understand your expectations, but all what you describe looks 
quite normal. The main reason to use NRT indexing is shorter turnaround times 
by not doing expensive commits. And that's what you see -- while indexing 
performance and also search performance go down depending on refresh rate.

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: [email protected]

> -----Original Message-----
> From: Alexander Lukyanchikov <[email protected]>
> Sent: Wednesday, August 4, 2021 4:43 AM
> To: [email protected]
> Subject: NRT readers and overall indexing/querying throughput
> 
> Hello everyone,
> 
> We are considering switching from regular to NRT readers, hoping it would
> improve overall indexing/querying throughput and also optimize the
> turnaround time.
> I did some benchmarks, mostly to understand how much benefit we can get
> and
> make sure I'm implementing everything correctly.
> 
> To my surprise, no matter how I tweak it, our indexing throughput is 10%
> lower with NRT, and query throughput (goes in parallel with indexing) is
> pretty much the same. I do see almost x5 turnaround time improvement
> though.
> Maybe I have wrong expectations, and less frequent commits with NRT refresh
> were not intended to improve overall performance?
> 
> Some details about the tests -
> Base implementation commits and refreshes a regular reader every second.
> NRT implementation commits every 60 seconds and refreshes NRT reader every
> second.
> The indexing rate is about 23 Mb/sec, query rate ~300 rps (text search with
> avg 50ms latency). Documents size is about 35 Kb.
> 36 core machine is used for the tests, and I don't see a big difference in
> JVM metrics between the tests. Also, there is no obvious bottleneck in
> CPU/memory/disk utilization (everything is way below 100%)
> NRT readers are implemented using the SearchManager, the same as the
> implementation
> in the Lucene benchmark
> <https://github.com/mikemccand/luceneutil/blob/master/src/main/perf/NRTP
> erfTest.java>
>  repository.
> With NRT, commit latency is about 3 sec, average refresh latency is 150ms.
> In the base approach, commit latency is about 500 ms, refresh 300 ms.
> I tried NRTCachingDirectory (with MmapDirectory and NIOFSDirectory), insert
> vs update workload, `applyAllDeletes=false`, single indexing thread -
> nothing helps to match the base version throughput.
> 
> I'd appreciate any advice. Am I missing something obvious, or the
> expectation that NRT with less frequent commits going to be more
> performant/resource-efficient is incorrect?
> 
> --
> Regards,
> Alex


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

RE: NRT readers and overall indexing/querying throughput

Reply via email to