Lucene has logic to only evaluate a subset of the matching documents when retrieving top-k hits. This leverages the Scorer#getMaxScore API. If you never implemented it on your custom query, then you never took advantage of dynamic pruning anyway. I wrote a bit more about it <https://www.elastic.co/blog/faster-retrieval-of-top-hits-in-elasticsearch-with-block-max-wand> a few years ago if you're curious.
On Tue, Jun 20, 2023 at 6:58 PM Vimal Jain <vkj...@gmail.com> wrote: > Thanks Adrien for quick response. > Yes , i am replacing disjuncts across multiple fields with single custom > term query over merged field. > Can you please provide more details on what do you mean by dynamic pruning > in context of custom term query ? > > On Tue, 20 Jun, 2023, 9:45 pm Adrien Grand, <jpou...@gmail.com> wrote: > > > Intuitively replacing a disjunction across multiple fields with a single > > term query should always be faster. > > > > You're saying that you're storing the type of token as part of the term > > frequency. This doesn't sound like something that would play well with > > dynamic pruning, so I wonder if this is the reason why you are seeing > > slower queries. But since you mentioned custom term queries, maybe you > > never actually took advantage of dynamic pruning? > > > > On Tue, Jun 20, 2023 at 10:30 AM Vimal Jain <vkj...@gmail.com> wrote: > > > > > Ok , sorry , I realized that I need to provide more context. > > > So we used to create a lucene query which consisted of custom term > > queries > > > for different fields and based on the type of field , we used to > assign a > > > boost that would be used in scoring. > > > Now we want to get rid off different fields and instead of creating > > > multiple term queries , we create only 1 term query for the merged > field > > > and the scorer of this term query ( on merged field ) makes use of > custom > > > term frequency info to deduce type of token ( during indexing we store > > this > > > info ) and hence the score that we were using earlier. > > > So perf drop is observed in reference to earlier implementation ( with > > > multiple term queries ). > > > > > > > > > *Thanks and Regards,* > > > *Vimal Jain* > > > > > > > > > On Tue, Jun 20, 2023 at 1:01 PM Adrien Grand <jpou...@gmail.com> > wrote: > > > > > > > You say you observed a performance drop, what are you comparing > > against? > > > > > > > > Le mar. 20 juin 2023, 08:59, Vimal Jain <vkj...@gmail.com> a écrit : > > > > > > > > > Note - i am using lucene 7.7.3 > > > > > > > > > > *Thanks and Regards,* > > > > > *Vimal Jain* > > > > > > > > > > > > > > > On Tue, Jun 20, 2023 at 12:26 PM Vimal Jain <vkj...@gmail.com> > > wrote: > > > > > > > > > > > Hi, > > > > > > I want to understand if fetching the term frequency of a term > > during > > > > > > scoring is relatively cpu bound operation ? > > > > > > Context - I am storing custom term frequency during indexing and > > > later > > > > > > using it for scoring during query execution time ( in Scorer's > > > score() > > > > > > method ). I noticed a performance drop in my application and I > > > suspect > > > > > it's > > > > > > because of this change. > > > > > > Any insight or related articles for reference would be > appreciated. > > > > > > > > > > > > > > > > > > *Thanks and Regards,* > > > > > > *Vimal Jain* > > > > > > > > > > > > > > > > > > > > > > > > -- > > Adrien > > > -- Adrien