Indeed! Thank you for all the helpful suggestions, especially from my
point of view re: HNSW, which is indeed costly to index. I am
surprised how much time is spent in SparseBitSet; perhaps a full
(non-sparse) bitset is called for, although I had initially shied away
from it since this indexing is already quite RAM-intensive. Also, I
did not know about Math.fma, I wonder if we can speed up dot-product
with it. And your observation about the vector indexing dominating the
indexing benchmark is fair - we may want to consider indexing vectors
more sparsely to trim that.

On Sat, Jan 16, 2021 at 5:18 AM Adrien Grand <[email protected]> wrote:
>
> This is very cool, thanks for sharing Anton!
>
> Le ven. 15 janv. 2021 à 23:40, Anton Hägerstrand <[email protected]> a écrit :
>>
>> Hello everyone!
>>
>> I recently wrote a blog post which looks into profiling data of the Lucene 
>> nightl benchmarks. I emailed Michael McCandless (the maintainer of the 
>> benchmarks) and he suggested that I post about it here, so here we go.
>>
>> The post is available at https://blunders.io/posts/lucene-bench-2021-01-10. 
>> I have published some more periodic profiling data at 
>> https://blunders.io/lucene-bench - this is not really nightly, but one might 
>> be able to spot changes over time.
>>
>> If you have any feedback or questions, I'll happily listen and answer.
>>
>> best regards,
>> Anton Hägerstrand
>>
>> PS. If no one beats me too it, I'll open a PR for the TermGroupSelector 
>> thing ;)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to