Re: Relative cpu cost of fetching term frequency during scoring

Adrien Grand Tue, 20 Jun 2023 12:37:18 -0700

Lucene has logic to only evaluate a subset of the matching documents when
retrieving top-k hits. This leverages the Scorer#getMaxScore API. If you
never implemented it on your custom query, then you never took advantage of
dynamic pruning anyway. I wrote a bit more about it
<https://www.elastic.co/blog/faster-retrieval-of-top-hits-in-elasticsearch-with-block-max-wand>
a few years ago if you're curious.


On Tue, Jun 20, 2023 at 6:58 PM Vimal Jain <[email protected]> wrote:

> Thanks Adrien for quick response.
> Yes , i am replacing disjuncts across multiple fields with single custom
> term query over merged field.
> Can you please provide more details on what do you mean by dynamic pruning
> in context of custom term query ?
>
> On Tue, 20 Jun, 2023, 9:45 pm Adrien Grand, <[email protected]> wrote:
>
> > Intuitively replacing a disjunction across multiple fields with a single
> > term query should always be faster.
> >
> > You're saying that you're storing the type of token as part of the term
> > frequency. This doesn't sound like something that would play well with
> > dynamic pruning, so I wonder if this is the reason why you are seeing
> > slower queries. But since you mentioned custom term queries, maybe you
> > never actually took advantage of dynamic pruning?
> >
> > On Tue, Jun 20, 2023 at 10:30 AM Vimal Jain <[email protected]> wrote:
> >
> > > Ok , sorry , I realized that I need to provide more context.
> > > So we used to create a lucene query which consisted of custom term
> > queries
> > > for different fields and based on the type of field , we used to
> assign a
> > > boost that would be used in scoring.
> > > Now we want to get rid off different fields and instead of creating
> > > multiple term queries , we create only 1 term query for the merged
> field
> > > and the scorer of this term query ( on merged field ) makes use of
> custom
> > > term frequency info to deduce type of token ( during indexing we store
> > this
> > > info ) and hence the score that we were using earlier.
> > > So perf drop is observed in reference to  earlier implementation ( with
> > > multiple term queries ).
> > >
> > >
> > > *Thanks and Regards,*
> > > *Vimal Jain*
> > >
> > >
> > > On Tue, Jun 20, 2023 at 1:01 PM Adrien Grand <[email protected]>
> wrote:
> > >
> > > > You say you observed a performance drop, what are you comparing
> > against?
> > > >
> > > > Le mar. 20 juin 2023, 08:59, Vimal Jain <[email protected]> a écrit :
> > > >
> > > > > Note - i am using lucene 7.7.3
> > > > >
> > > > > *Thanks and Regards,*
> > > > > *Vimal Jain*
> > > > >
> > > > >
> > > > > On Tue, Jun 20, 2023 at 12:26 PM Vimal Jain <[email protected]>
> > wrote:
> > > > >
> > > > > > Hi,
> > > > > > I want to understand if fetching the term frequency of a term
> > during
> > > > > > scoring is relatively cpu bound operation ?
> > > > > > Context - I am storing custom term frequency during indexing and
> > > later
> > > > > > using it for scoring during query execution time ( in Scorer's
> > > score()
> > > > > > method ). I noticed a performance drop in my application and I
> > > suspect
> > > > > it's
> > > > > > because of this change.
> > > > > > Any insight or related articles for reference would be
> appreciated.
> > > > > >
> > > > > >
> > > > > > *Thanks and Regards,*
> > > > > > *Vimal Jain*
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> > Adrien
> >
>


-- 
Adrien

Re: Relative cpu cost of fetching term frequency during scoring

Reply via email to