Re: Unnecessary float[256] allocation on every (non-scoring) BM25Scorer

Michael Froh Tue, 02 May 2023 12:24:24 -0700

> This seems ok if it isn't invasive. I still feel like something is
> "off" if you are seeing GC time from 1KB-per-segment allocation. Do
> you have way too many segments?


>From what I saw, it's 1KB per "leaf query" to create the
BM25Scorer instance (at the Weight level), but then that BM25Scorer is
shared across all scorer (DISI) instances for all segments. So it doesn't
scale with segment count. It looks like the old logic used to allocate a
SimScorer per segment, so this is a big improvement in that regard (for
scoring clauses, since the non-scoring clauses had a super-lightweight
SimScorer).

In this particular case, they're running these gnarly machine-generated
BoolenQuery trees with at least 512 non-scoring TermQuery clauses (across a
bunch of different fields, so TermInSetQuery isn't an option). From what I
can see, each of those TermQueries produces a TermWeight that holds a
BM25Scorer that holds yet another instance of this float[256] array, for
512KB+ of these caches per running query. It's definitely only going to be
an issue for folks who are flying close to the max clause count.

> One last thought: we should re-check if the cache is still needed :) I
> think decoding norms used to be more expensive in the past. This cache
> is now only precomputing part of the bm25 formula to save some
> add/multiply/divide.

Yeah -- when I saw the cache calculation, it reminded me of precomputed
tables of trigonometric functions in the demoscene.

I could try inlining those calculations and measuring the impact with the
luceneutil benchmarks.


On Tue, May 2, 2023 at 11:34 AM Robert Muir <rcm...@gmail.com> wrote:

> On Tue, May 2, 2023 at 12:49 PM Michael Froh <msf...@gmail.com> wrote:
> >
> > Hi all,
> >
> > I was looking into a customer issue where they noticed some increased GC
> time after upgrading from Lucene 7.x to 9.x. After taking some heap dumps
> from both systems, the big difference was tracked down to the float[256]
> allocated (as a norms cache) when creating a BM25Scorer (in
> BM25Similarity.scorer()).
> >
> > The change seems to have come in with
> https://github.com/apache/lucene/commit/8fd7ead940f69a892dfc951a1aa042e8cae806c1,
> which removed some of the special-case logic around the "non-scoring
> similarity" embedded in IndexSearcher (returned in the false case from the
> old IndexSearcher#scorer(boolean needsScores)).
> >
> > While I really like that we no longer have that special-case logic in
> IndexSearcher, we now have the issue that every time we create a new
> TermWeight (or other Weight) it allocates a float[256], even if the
> TermWeight doesn't need scores. Also, I think it's the exact same
> float[256] for all non-scoring weights, since it's being computed using the
> same "all 1s" CollectionStatistics and TermStatistics.
> >
> > (For the record, yes, the queries in question have an obscene number of
> TermQueries, so 1024 bytes times lots of TermWeights, times multiple
> queries running concurrently makes lots of heap allocation.)
> >
> > I'd like to submit a patch to fix this, but I'm wondering what approach
> to take. One option I'm considering is precomputing a singleton float[256]
> for the non-scoring case (where CollectionStatistics and TermStatistics are
> all 1s). That would have the least functional impact, but would let all
> non-scoring clauses share the same array. Is there a better way to tackle
> this?
> >
>
> This seems ok if it isn't invasive. I still feel like something is
> "off" if you are seeing GC time from 1KB-per-segment allocation. Do
> you have way too many segments?
>
> Originally (for various similar reasons) there was a place in the API
> to do this, so it would only happen per-Weight instead of per-Scorer,
> which was the SimWeight that got eliminated by the commit you point
> to. But I'd love if we could steer clear of that complexity:
> simplifying the API here was definitely the right move. Its been more
> than 5 years since this change was made, and this is the first
> complaint i've heard about the 1KB, which is why i asked about your
> setup.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Re: Unnecessary float[256] allocation on every (non-scoring) BM25Scorer

Reply via email to