Re: Inlining, virtual calls and BKDPointsTree

Anton Hägerstrand Sat, 02 Mar 2024 02:33:44 -0800

Thank you Gautam!

> Yeah, it seems like luceneutil is not stressing the code path that
ElasticSearch's benchmarks are?


Yes, as far as I understand it - though it might just be that I don't
understand luceneutil good enough. I believe that in order to see the
performance diff numerical range queries or numerical sorting would have to
be involved - the more documents matched the larger the difference. This is
what the relevant benchmark operations from Elastic does.

> So it seems like switching over from an iterative visit(int docID) call
to a bulk visit(DocIdSetIterator iterator) gave us these gains? Cool!

Yes, it seems like it, based on this benchmark.

> I am running Amazon Product Search's benchmarks to see if the change is
needle moving for us.

Thank you, much appreciated!

> Small suggestion on the blog...

Thank you for the feedback! The post is definitely a bit confusing, I
struggled with keeping it clear. I will try to make some edits to make it
clearer what conclusions can be made after each section.

/Anton

On Sat, 2 Mar 2024 at 00:30, Gautam Worah <worah.gau...@gmail.com> wrote:

> Hi Anton,
>
> It took me a while to get through the blog post, and I suspect I will need
> to read through a couple more times to understand it fully.
> Thanks for writing up something so detailed. I learnt a lot! (especially
> about JVM inlining methods).
>
> > I have not been able to reproduce the speedup with lucenutil - I suspect
> that the default tasks in it would not trigger this code path that much.
>
> Yeah, it seems like luceneutil is not stressing the code path that
> ElasticSearch's benchmarks are?
>
> > I tried changing the DocIdsWriter::readInts32 (and readDelta16), instead
> calling the IntersectVisitor with a DocIdSetItorator, to reduce the number
> of virtual calls. In the benchmark setup by Elastic [2] I saw a decrease of
> execution time of 35-45% for range queries and numerical sorting with this
> patch applied.
>
> So it seems like switching over from an iterative visit(int docID) call
> to a bulk visit(DocIdSetIterator iterator) gave us these gains? Cool!
>
> I am running Amazon Product Search's benchmarks to see if the change is
> needle moving for us.
>
> Small suggestion on the blog: The JVM inlining, ElasticSearch
> short-circuiting/opto causing a difference in performance could've been a
> blog on its own, part 1 maybe.. I got confused when the blog shifted from
> the performance differences between ElasticSearch and OpenSearch, to how
> you ended up improving Lucene.
>
> Regards,
> Gautam Worah.
>
>
> On Fri, Mar 1, 2024 at 2:42 AM Anton Hägerstrand <an...@blunders.io>
> wrote:
>
>> Hi everyone, long time lurker here.
>>
>> I recently investigated Elasticsearch/OpenSearch performance in a blog
>> post [1], and saw some interesting behavior of numerical range queries and
>> numerical sorting with regards to inlining and virtual calls.
>>
>> In short, the DocIdsWriter::readInts method seems to get much slower if
>> it is called with 3 or more implementations of IntersectVisitor during the
>> JVM lifetime. I believe that this is due to IntersectVisitory.visit(docid)
>> being heavily inlined with 2 or fewer IntersectVisitor implementations,
>> while becoming a virtual call with 3 or more.
>>
>> This leads to two interesting points wrt Lucene
>>
>> 1) For benchmarks, warm ups should not only be done to trigger speedups
>> by the JIT, instead making the JVM be in a realistic production state. For
>> the BKDPointTree, this means at least 3 implementations of the
>> IntersectVisitor. I'm not sure if this is top of mind when writing Lucene
>> benchmarks?
>> 2) I tried changing the DocIdsWriter::readInts32 (and readDelta16),
>> instead calling the IntersectVisitor with a DocIdSetItorator, to reduce the
>> number of virtual calls. In the benchmark setup by Elastic [2] I saw a
>> decrease of execution time of 35-45% for range queries and numerical
>> sorting with this patch applied. PR:
>> https://github.com/apache/lucene/pull/13149
>>
>> I have not been able to reproduce the speedup with lucenutil - I suspect
>> that the default tasks in it would not trigger this code path that much.
>>
>> If you want understand more of my line of thinking, consider skimming
>> through the blog post [1]
>>
>> [1] https://blunders.io/posts/es-benchmark-4-inlining
>> [2] https://github.com/elastic/elasticsearch-opensearch-benchmark
>>
>> best regards,
>> Anton H
>>
>

Re: Inlining, virtual calls and BKDPointsTree

Reply via email to