[GitHub] [lucene] uschindler commented on issue #12358: Optimize `count()` for BooleanQuery disjunction

via GitHub Sat, 10 Jun 2023 03:06:04 -0700


uschindler commented on issue #12358:
URL: https://github.com/apache/lucene/issues/12358#issuecomment-1585595163


   Hi. It looks like the DoQuery.java code does not for a throughput 
measurement, but instead it runs all queries in a single thread one after each 
other with Nanotime before and after (thanks for the fix, Mike). So we measure 
exactly duration of each query. So we should use also ParallelGC. The default 
G1GC works better when you hammer a server multithreaded, but if there's only 
one thread doing queries, ParallelGC is better.
   
   Of course a real world benchmark should also measure throughput by hammering 
a server with hundreds of parallel queries (many more than there are CPU cores) 
to saturate all CPU cores. Of course in such throughout scenarios I have seen 
sometimes single queries taking long time, but you need to also look at 
percentiles then.
   
   I know that lucene is very good in throughput measurements.
   
   I know this comment goes too far and beyond this issue, but we should really 
look at other scenarios than measuring the duration a query takes.
   
   About vector: this does not apply here because there's no vector search 
involved. Still with modern Java version like jdk-20 the warmup time in 
combination with parallelgc is higher due to tiered compilation.
   
   Please use java 20 for benchmarks to also see benefits from mmap, especially 
with indexes optimized to one segment. Also enable parallel GC, although it's 
not real world, but the benchmark isn't, too.
   
   Please do not pass any extra JVM args, except GC and heap size.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] uschindler commented on issue #12358: Optimize `count()` for BooleanQuery disjunction

Reply via email to