Re: Unnecessary float[256] allocation on every (non-scoring) BM25Scorer

Robert Muir Tue, 02 May 2023 13:14:18 -0700

On Tue, May 2, 2023 at 3:24 PM Michael Froh <msf...@gmail.com> wrote:
>
> > This seems ok if it isn't invasive. I still feel like something is
> > "off" if you are seeing GC time from 1KB-per-segment allocation. Do
> > you have way too many segments?
>
> From what I saw, it's 1KB per "leaf query" to create the BM25Scorer instance 
> (at the Weight level), but then that BM25Scorer is shared across all scorer 
> (DISI) instances for all segments. So it doesn't scale with segment count. It 
> looks like the old logic used to allocate a SimScorer per segment, so this is 
> a big improvement in that regard (for scoring clauses, since the non-scoring 
> clauses had a super-lightweight SimScorer).
>
> In this particular case, they're running these gnarly machine-generated 
> BoolenQuery trees with at least 512 non-scoring TermQuery clauses (across a 
> bunch of different fields, so TermInSetQuery isn't an option). From what I 
> can see, each of those TermQueries produces a TermWeight that holds a 
> BM25Scorer that holds yet another instance of this float[256] array, for 
> 512KB+ of these caches per running query. It's definitely only going to be an 
> issue for folks who are flying close to the max clause count.
>


Yeah, but the same situation could be said for buffers like this one:
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90PostingsReader.java#L311-L312
So I'm actually still confused why this float[256] stands out in your
measurejments vs two long[128]'s. Maybe its a profiler ghost?

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Unnecessary float[256] allocation on every (non-scoring) BM25Scorer

Reply via email to