Hi Bazir, this feels like an X Y problem [1 <https://xyproblem.info>]. Can you express what is your original user requirement? Most of the time, at the cost of indexing time/space you may get quicker query times. Also, you should identify where are you wasting most of your time, in the matching phase (identifying candidates from the corpus of documents) or in the ranking phase (scoring them by relevance)?
TopScoreDocCollector is quite a solid class, there's a ton to study, analyze and experiment before raising the alarm of a bug :) Also didn't understand this : "what if the user needs to limit the search process?" Can you elaborate? Cheers [1] https://xyproblem.info -------------------------- Alessandro Benedetti Apache Lucene/Solr Committer Director, R&D Software Engineer, Search Consultant www.sease.io On Wed, 9 Jun 2021 at 19:08, <baris.ka...@oracle.com> wrote: > Yes, i did those and i believe i am at the best level of performance now > and it is not bad at all but i want to make it much better. > > i see like a linear drop in timings when i go lower number of words but > let me do that quick study again. > > Fuzzy search is always expensive but that seems to suit best to my needs. > > > Thanks Diego for these great questions and i already explored them. But > thanks again. > > Best regards > > > On 6/9/21 2:04 PM, Diego Ceccarelli (BLOOMBERG/ LONDON) wrote: > > I have never used fuzzy search but from the documentation it seems very > expensive, and if you do it on 10 terms and 1M documents it seems very very > very expensive. > > > > Are you using the default 'fuzzyness' parameter? (0.5) - It might end up > exploring a lot of documents, did you try to play with that parameter? > > > > Have you tried to see how the performance change if you do not use fuzzy > (just to see if is fuzzy the introduce the slow down)? > > Or what happens to performance if you do fuzzy with 1, 2, 5 terms > instead of 10? > > > > > > From: java-user@lucene.apache.org At: 06/09/21 18:56:31To: > java-user@lucene.apache.org, baris.ka...@oracle.com > > Subject: Re: Potential bug > > > > i cant reveal those details i am very sorry. but it is more than 1 > million. > > > > let me tell that i have a lot of code that processes results from lucene > > but the bottle neck is lucene fuzzy search. > > > > Best regards > > > > > > On 6/9/21 1:53 PM, Diego Ceccarelli (BLOOMBERG/ LONDON) wrote: > >> How many documents do you have in the index? > >> and can you show an example of query? > >> > >> > >> From: java-user@lucene.apache.org At: 06/09/21 18:33:25To: > > java-user@lucene.apache.org, baris.ka...@oracle.com > >> Subject: Re: Potential bug > >> > >> i have only two fields one string the other is a number (stored as > >> string), i guess you cant go simpler than this. > >> > >> i retreieve the hits and my major bottleneck is lucene fuzzy search. > >> > >> > >> i take each word from the string which is usually around at most 10 > words > >> > >> i build a fuzzy boolean query out of them. > >> > >> > >> simple query is like this 10 word query. > >> > >> > >> limit means i want to stop lucene search around 20 hits i dont want > >> thousands of hits. > >> > >> > >> Best regards > >> > >> > >> On 6/9/21 1:25 PM, Diego Ceccarelli (BLOOMBERG/ LONDON) wrote: > >> > >>> Hi Baris, > >>> > >>>> what if the user needs to limit the search process? > >>> What do you mean by 'limit'? > >>> > >>>> there should be a way to speedup lucene then if this is not possible, > >>>> since for some simple queries it takes half a second which is too > long. > >>> What do you mean by 'simple' query? there might be multiple reasons > behind > >> slowness of a query that are unrelated to the search (for example, if > you > >> retrieve many documents and for each document you are extracting the > content > > of > >> many fields) - would you like to tell us a bit more about your use case? > >>> Regards, > >>> Diego > >>> > >>> From: java-user@lucene.apache.org At: 06/09/21 18:18:01To: > >> java-user@lucene.apache.org > >>> Cc: baris.ka...@oracle.com > >>> Subject: Re: Potential bug > >>> > >>> Thanks Adrien, but the differences is too far apart. > >>> > >>> I think the algorithm needs to be revised. > >>> > >>> > >>> what if the user needs to limit the search process? > >>> > >>> that leaves no control. > >>> > >>> there should be a way to speedup lucene then if this is not possible, > >>> > >>> since for some simple queries it takes half a second which is too long. > >>> > >>> Best regards > >>> > >>> > >>> On 6/9/21 1:13 PM, Adrien Grand wrote: > >>>> Hi Baris, > >>>> > >>>> totalhitsThreshold is actually a minimum threshold, not a maximum > threshold. > >>>> > >>>> The problem is that Lucene cannot directly identify the top matching > >>>> documents for a given query. The strategy it adopts is to start > collecting > >>>> hits naively in doc ID order and to progressively raise the bar about > the > >>>> minimum score that is required for a hit to be competitive in order > to skip > >>>> non-competitive documents. So it's expected that Lucene still > collects 100s > >>>> or 1000s of hits, even though the collector is configured to only > compute > >>>> the top 10 hits. > >>>> > >>>> On Wed, Jun 9, 2021 at 7:07 PM <baris.ka...@oracle.com> wrote: > >>>> > >>>>> Hi,- > >>>>> > >>>>> i think this is a potential bug > >>>>> > >>>>> > >>>>> i set this time totalHitsThreshold to 10 and i get totalhits > reported as > >>>>> 1655 but i get 10 results in total. > >>>>> > >>>>> I think this suggests that there might be a bug with > >>>>> TopScoreDocCollector algorithm. > >>>>> > >>>>> > >>>>> Best regards > >>>>> > >>>>> > >>>>> > >>>>> --------------------------------------------------------------------- > >>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org > >>>>> > >>>>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >>> For additional commands, e-mail: java-user-h...@lucene.apache.org > >>> > >>> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > >> > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >