Indeed! Thank you for all the helpful suggestions, especially from my point of view re: HNSW, which is indeed costly to index. I am surprised how much time is spent in SparseBitSet; perhaps a full (non-sparse) bitset is called for, although I had initially shied away from it since this indexing is already quite RAM-intensive. Also, I did not know about Math.fma, I wonder if we can speed up dot-product with it. And your observation about the vector indexing dominating the indexing benchmark is fair - we may want to consider indexing vectors more sparsely to trim that.
On Sat, Jan 16, 2021 at 5:18 AM Adrien Grand <[email protected]> wrote: > > This is very cool, thanks for sharing Anton! > > Le ven. 15 janv. 2021 à 23:40, Anton Hägerstrand <[email protected]> a écrit : >> >> Hello everyone! >> >> I recently wrote a blog post which looks into profiling data of the Lucene >> nightl benchmarks. I emailed Michael McCandless (the maintainer of the >> benchmarks) and he suggested that I post about it here, so here we go. >> >> The post is available at https://blunders.io/posts/lucene-bench-2021-01-10. >> I have published some more periodic profiling data at >> https://blunders.io/lucene-bench - this is not really nightly, but one might >> be able to spot changes over time. >> >> If you have any feedback or questions, I'll happily listen and answer. >> >> best regards, >> Anton Hägerstrand >> >> PS. If no one beats me too it, I'll open a PR for the TermGroupSelector >> thing ;) --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
