How many documents do you have in the index? and can you show an example of query?
From: java-user@lucene.apache.org At: 06/09/21 18:33:25To: java-user@lucene.apache.org, baris.ka...@oracle.com Subject: Re: Potential bug i have only two fields one string the other is a number (stored as string), i guess you cant go simpler than this. i retreieve the hits and my major bottleneck is lucene fuzzy search. i take each word from the string which is usually around at most 10 words i build a fuzzy boolean query out of them. simple query is like this 10 word query. limit means i want to stop lucene search around 20 hits i dont want thousands of hits. Best regards On 6/9/21 1:25 PM, Diego Ceccarelli (BLOOMBERG/ LONDON) wrote: > Hi Baris, > >> what if the user needs to limit the search process? > What do you mean by 'limit'? > >> there should be a way to speedup lucene then if this is not possible, >> since for some simple queries it takes half a second which is too long. > What do you mean by 'simple' query? there might be multiple reasons behind slowness of a query that are unrelated to the search (for example, if you retrieve many documents and for each document you are extracting the content of many fields) - would you like to tell us a bit more about your use case? > > Regards, > Diego > > From: java-user@lucene.apache.org At: 06/09/21 18:18:01To: java-user@lucene.apache.org > Cc: baris.ka...@oracle.com > Subject: Re: Potential bug > > Thanks Adrien, but the differences is too far apart. > > I think the algorithm needs to be revised. > > > what if the user needs to limit the search process? > > that leaves no control. > > there should be a way to speedup lucene then if this is not possible, > > since for some simple queries it takes half a second which is too long. > > Best regards > > > On 6/9/21 1:13 PM, Adrien Grand wrote: >> Hi Baris, >> >> totalhitsThreshold is actually a minimum threshold, not a maximum threshold. >> >> The problem is that Lucene cannot directly identify the top matching >> documents for a given query. The strategy it adopts is to start collecting >> hits naively in doc ID order and to progressively raise the bar about the >> minimum score that is required for a hit to be competitive in order to skip >> non-competitive documents. So it's expected that Lucene still collects 100s >> or 1000s of hits, even though the collector is configured to only compute >> the top 10 hits. >> >> On Wed, Jun 9, 2021 at 7:07 PM <baris.ka...@oracle.com> wrote: >> >>> Hi,- >>> >>> i think this is a potential bug >>> >>> >>> i set this time totalHitsThreshold to 10 and i get totalhits reported as >>> 1655 but i get 10 results in total. >>> >>> I think this suggests that there might be a bug with >>> TopScoreDocCollector algorithm. >>> >>> >>> Best regards >>> >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org