Returning large resultset is slow and resource intensive

Alexander Lukyanchikov Tue, 08 Mar 2022 00:22:30 -0800

Hello everyone,
For our use case, we need to run queries which return the full
matched result set. In some cases, this result set can be large (50k+
results out of 4 million total documents).
Perf test showed that just 4 threads running random queries returning 50k
results make Lucene utilize 100% CPU on a 4-core machine (profiler
screenshot
<https://user-images.githubusercontent.com/6069066/157188814-fbd9d205-c2e4-45b6-b98d-b7622b6ac801.png>).
The query is very simple and contains only a single-term filter clause, all
unrelated parts of the application are disabled, no stored fields are
fetched, GC is doing minimal amount of work
<https://user-images.githubusercontent.com/6069066/157191646-eb8c5ccc-41c1-4af1-afcf-37d0c5f86054.png>
.


My understanding is that fetching a large result set is not exactly
the best use case for Lucene, as explained here
<http://philosophyforprogrammers.blogspot.com/2010/09/lucene-performance.html>.
But I wonder if there are ways to optimize something / use a special type
of collector in order to minimize CPU utilization?

Thank you,
Alex

Returning large resultset is slow and resource intensive

Reply via email to