Thank you Gautam! > Yeah, it seems like luceneutil is not stressing the code path that ElasticSearch's benchmarks are?
Yes, as far as I understand it - though it might just be that I don't understand luceneutil good enough. I believe that in order to see the performance diff numerical range queries or numerical sorting would have to be involved - the more documents matched the larger the difference. This is what the relevant benchmark operations from Elastic does. > So it seems like switching over from an iterative visit(int docID) call to a bulk visit(DocIdSetIterator iterator) gave us these gains? Cool! Yes, it seems like it, based on this benchmark. > I am running Amazon Product Search's benchmarks to see if the change is needle moving for us. Thank you, much appreciated! > Small suggestion on the blog... Thank you for the feedback! The post is definitely a bit confusing, I struggled with keeping it clear. I will try to make some edits to make it clearer what conclusions can be made after each section. /Anton On Sat, 2 Mar 2024 at 00:30, Gautam Worah <worah.gau...@gmail.com> wrote: > Hi Anton, > > It took me a while to get through the blog post, and I suspect I will need > to read through a couple more times to understand it fully. > Thanks for writing up something so detailed. I learnt a lot! (especially > about JVM inlining methods). > > > I have not been able to reproduce the speedup with lucenutil - I suspect > that the default tasks in it would not trigger this code path that much. > > Yeah, it seems like luceneutil is not stressing the code path that > ElasticSearch's benchmarks are? > > > I tried changing the DocIdsWriter::readInts32 (and readDelta16), instead > calling the IntersectVisitor with a DocIdSetItorator, to reduce the number > of virtual calls. In the benchmark setup by Elastic [2] I saw a decrease of > execution time of 35-45% for range queries and numerical sorting with this > patch applied. > > So it seems like switching over from an iterative visit(int docID) call > to a bulk visit(DocIdSetIterator iterator) gave us these gains? Cool! > > I am running Amazon Product Search's benchmarks to see if the change is > needle moving for us. > > Small suggestion on the blog: The JVM inlining, ElasticSearch > short-circuiting/opto causing a difference in performance could've been a > blog on its own, part 1 maybe.. I got confused when the blog shifted from > the performance differences between ElasticSearch and OpenSearch, to how > you ended up improving Lucene. > > Regards, > Gautam Worah. > > > On Fri, Mar 1, 2024 at 2:42 AM Anton Hägerstrand <an...@blunders.io> > wrote: > >> Hi everyone, long time lurker here. >> >> I recently investigated Elasticsearch/OpenSearch performance in a blog >> post [1], and saw some interesting behavior of numerical range queries and >> numerical sorting with regards to inlining and virtual calls. >> >> In short, the DocIdsWriter::readInts method seems to get much slower if >> it is called with 3 or more implementations of IntersectVisitor during the >> JVM lifetime. I believe that this is due to IntersectVisitory.visit(docid) >> being heavily inlined with 2 or fewer IntersectVisitor implementations, >> while becoming a virtual call with 3 or more. >> >> This leads to two interesting points wrt Lucene >> >> 1) For benchmarks, warm ups should not only be done to trigger speedups >> by the JIT, instead making the JVM be in a realistic production state. For >> the BKDPointTree, this means at least 3 implementations of the >> IntersectVisitor. I'm not sure if this is top of mind when writing Lucene >> benchmarks? >> 2) I tried changing the DocIdsWriter::readInts32 (and readDelta16), >> instead calling the IntersectVisitor with a DocIdSetItorator, to reduce the >> number of virtual calls. In the benchmark setup by Elastic [2] I saw a >> decrease of execution time of 35-45% for range queries and numerical >> sorting with this patch applied. PR: >> https://github.com/apache/lucene/pull/13149 >> >> I have not been able to reproduce the speedup with lucenutil - I suspect >> that the default tasks in it would not trigger this code path that much. >> >> If you want understand more of my line of thinking, consider skimming >> through the blog post [1] >> >> [1] https://blunders.io/posts/es-benchmark-4-inlining >> [2] https://github.com/elastic/elasticsearch-opensearch-benchmark >> >> best regards, >> Anton H >> >