This is a bit surprising, can you share the profiler output (e.g. screenshot), to see what is slow within the `PostingsEnum#freq` call?
`PostingsEnum#freq` may need to decode a block of freqs, but I would generally not expect it to be 5x slower than decoding doc IDs for the same block. On Thu, Jun 22, 2023 at 6:00 AM Vimal Jain <vkj...@gmail.com> wrote: > > I did profiling of new code and found that below api call is most time > consuming :- > org.apache.lucene.index.PostingsEnum#freq > If i comment out this call and instead use some random integer for testing > purpose, then perf is at least 5x compared to old code. > Is there any thoughts on why term frequency calls on PostingsEnum are that > slow ? > > > > *Thanks and Regards,* > *Vimal Jain* > > > On Wed, Jun 21, 2023 at 1:43 PM Adrien Grand <jpou...@gmail.com> wrote: > > > As far as your performance problem is concerned, I don't know. Can you > > compare the number of documents that need to be evaluated in both cases, > > e.g. by running `IndexSearcher#count` on your two queries. If they're > > similar, can you run your new query under a profiler to figure out what its > > bottleneck is? > > > > Regarding migration to newer major version, there is a MIGRATE.txt that > > gives some advice: > > > > https://github.com/apache/lucene/blob/releases/lucene-solr/8.0.0/lucene/MIGRATE.txt > > . > > > > On Wed, Jun 21, 2023 at 8:54 AM Vimal Jain <vkj...@gmail.com> wrote: > > > > > Thanks Adrien , I had a look at your blog post. Looks like this > > > Scorer#getMaxScore was added in lucene 8.0 , i am using 7.7.3. > > > A side question , is there any resource to help migrate newer major > > version > > > , i see lot of api changed from v7 to v8. > > > > > > *Thanks and Regards,* > > > *Vimal Jain* > > > > > > > > > On Wed, Jun 21, 2023 at 1:08 AM Adrien Grand <jpou...@gmail.com> wrote: > > > > > > > Lucene has logic to only evaluate a subset of the matching documents > > when > > > > retrieving top-k hits. This leverages the Scorer#getMaxScore API. If > > you > > > > never implemented it on your custom query, then you never took > > advantage > > > of > > > > dynamic pruning anyway. I wrote a bit more about it > > > > < > > > > > > > > > https://www.elastic.co/blog/faster-retrieval-of-top-hits-in-elasticsearch-with-block-max-wand > > > > > > > > > a few years ago if you're curious. > > > > > > > > On Tue, Jun 20, 2023 at 6:58 PM Vimal Jain <vkj...@gmail.com> wrote: > > > > > > > > > Thanks Adrien for quick response. > > > > > Yes , i am replacing disjuncts across multiple fields with single > > > custom > > > > > term query over merged field. > > > > > Can you please provide more details on what do you mean by dynamic > > > > pruning > > > > > in context of custom term query ? > > > > > > > > > > On Tue, 20 Jun, 2023, 9:45 pm Adrien Grand, <jpou...@gmail.com> > > wrote: > > > > > > > > > > > Intuitively replacing a disjunction across multiple fields with a > > > > single > > > > > > term query should always be faster. > > > > > > > > > > > > You're saying that you're storing the type of token as part of the > > > term > > > > > > frequency. This doesn't sound like something that would play well > > > with > > > > > > dynamic pruning, so I wonder if this is the reason why you are > > seeing > > > > > > slower queries. But since you mentioned custom term queries, maybe > > > you > > > > > > never actually took advantage of dynamic pruning? > > > > > > > > > > > > On Tue, Jun 20, 2023 at 10:30 AM Vimal Jain <vkj...@gmail.com> > > > wrote: > > > > > > > > > > > > > Ok , sorry , I realized that I need to provide more context. > > > > > > > So we used to create a lucene query which consisted of custom > > term > > > > > > queries > > > > > > > for different fields and based on the type of field , we used to > > > > > assign a > > > > > > > boost that would be used in scoring. > > > > > > > Now we want to get rid off different fields and instead of > > creating > > > > > > > multiple term queries , we create only 1 term query for the > > merged > > > > > field > > > > > > > and the scorer of this term query ( on merged field ) makes use > > of > > > > > custom > > > > > > > term frequency info to deduce type of token ( during indexing we > > > > store > > > > > > this > > > > > > > info ) and hence the score that we were using earlier. > > > > > > > So perf drop is observed in reference to earlier implementation > > ( > > > > with > > > > > > > multiple term queries ). > > > > > > > > > > > > > > > > > > > > > *Thanks and Regards,* > > > > > > > *Vimal Jain* > > > > > > > > > > > > > > > > > > > > > On Tue, Jun 20, 2023 at 1:01 PM Adrien Grand <jpou...@gmail.com> > > > > > wrote: > > > > > > > > > > > > > > > You say you observed a performance drop, what are you comparing > > > > > > against? > > > > > > > > > > > > > > > > Le mar. 20 juin 2023, 08:59, Vimal Jain <vkj...@gmail.com> a > > > > écrit : > > > > > > > > > > > > > > > > > Note - i am using lucene 7.7.3 > > > > > > > > > > > > > > > > > > *Thanks and Regards,* > > > > > > > > > *Vimal Jain* > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Jun 20, 2023 at 12:26 PM Vimal Jain < > > vkj...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > I want to understand if fetching the term frequency of a > > term > > > > > > during > > > > > > > > > > scoring is relatively cpu bound operation ? > > > > > > > > > > Context - I am storing custom term frequency during > > indexing > > > > and > > > > > > > later > > > > > > > > > > using it for scoring during query execution time ( in > > > Scorer's > > > > > > > score() > > > > > > > > > > method ). I noticed a performance drop in my application > > and > > > I > > > > > > > suspect > > > > > > > > > it's > > > > > > > > > > because of this change. > > > > > > > > > > Any insight or related articles for reference would be > > > > > appreciated. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > *Thanks and Regards,* > > > > > > > > > > *Vimal Jain* > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Adrien > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Adrien > > > > > > > > > > > > > -- > > Adrien > > -- Adrien --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org