+1 to Adrien. Let's keep the tone neutral.
On Mon, 14 Jun 2021, 16:00 Adrien Grand, <jpou...@gmail.com> wrote: > Baris, you called out an insult from Alessandro and your replies suggest > anger, but I couldn't see an insult from Alessandro actually. > > +1 to Alessandro's call to make the tone softer on this discussion. > > On Mon, Jun 14, 2021 at 11:28 AM Alessandro Benedetti < > a.benede...@sease.io> > wrote: > > > Hi Baris, > > first of all apologies for having misspelled your name, definitely, it > was > > not meant as an insult. > > Secondly, your tone is not acceptable on this mailing list (or anywhere > > else). > > You must remember that we, committers, are operating on a volunteering > > basis, contributing code and helping people in our free time purely > driven > > by passion. > > Respect is fundamental, we are not here to be treated aggressively. > > > > Regards > > > > -------------------------- > > Alessandro Benedetti > > Apache Lucene/Solr Committer > > Director, R&D Software Engineer, Search Consultant > > > > www.sease.io > > > > > > On Fri, 11 Jun 2021 at 17:10, <baris.ka...@oracle.com> wrote: > > > > > Let me guide to a professional answer to the below email: > > > > > > > > > Hi Baris, > > > > > > Since You mentioned You did all the performance study on your > > > application and still believe that > > > > > > the bottleneck is the fuzzy search api from Lucene, it would be best to > > > time the application for: > > > > > > * matching phase (identifying candidates from the corpus of > documents) > > > * or in the ranking phase (scoring them by relevance)? > > > > > > Maybe this will help speedup further. > > > > > > Also, what do You mean by "what is the user needs to to limit te search > > > process" ? can you elaborate? > > > > > > Cheers > > > > > > > > > > > > My answer would be : > > > > > > i cant access the Lucene code so how can time these two cases please? > > > > > > i mean by that sentence that when i see the hits are good i would like > > > to limit the number of hits. > > > > > > > > > > > > this is more like a professional conversation please. Thanks. > > > > > > Best regards > > > > > > > > > On 6/11/21 11:57 AM, Alessandro Benedetti wrote: > > > > Hi Bazir, > > > > this feels like an X Y problem [1 < > > > > > > https://urldefense.com/v3/__https://xyproblem.info__;!!GqivPVa7Brio!IrgovQa8yo6rznUAykFBDcTgg_ixlPdRqBgWx6UAfWeZTlJ99CVYsv69Tq2Yo0eBzg$ > > > >]. > > > > Can you express what is your original user requirement? > > > > Most of the time, at the cost of indexing time/space you may get > > quicker > > > > query times. > > > > Also, you should identify where are you wasting most of your time, in > > the > > > > matching phase (identifying candidates from the corpus of documents) > or > > > in > > > > the ranking phase (scoring them by relevance)? > > > > > > > > TopScoreDocCollector is quite a solid class, there's a ton to study, > > > > analyze and experiment before raising the alarm of a bug :) > > > > > > > > Also didn't understand this : > > > > "what if the user needs to limit the search process?" > > > > Can you elaborate? > > > > > > > > Cheers > > > > > > > > > > > > > > > > [1] > > > > > > https://urldefense.com/v3/__https://xyproblem.info__;!!GqivPVa7Brio!IrgovQa8yo6rznUAykFBDcTgg_ixlPdRqBgWx6UAfWeZTlJ99CVYsv69Tq2Yo0eBzg$ > > > > -------------------------- > > > > Alessandro Benedetti > > > > Apache Lucene/Solr Committer > > > > Director, R&D Software Engineer, Search Consultant > > > > > > > > > > > > > > https://urldefense.com/v3/__http://www.sease.io__;!!GqivPVa7Brio!IrgovQa8yo6rznUAykFBDcTgg_ixlPdRqBgWx6UAfWeZTlJ99CVYsv69Tq07hrsXPw$ > > > > > > > > > > > > On Wed, 9 Jun 2021 at 19:08, <baris.ka...@oracle.com> wrote: > > > > > > > >> Yes, i did those and i believe i am at the best level of performance > > now > > > >> and it is not bad at all but i want to make it much better. > > > >> > > > >> i see like a linear drop in timings when i go lower number of words > > but > > > >> let me do that quick study again. > > > >> > > > >> Fuzzy search is always expensive but that seems to suit best to my > > > needs. > > > >> > > > >> > > > >> Thanks Diego for these great questions and i already explored them. > > But > > > >> thanks again. > > > >> > > > >> Best regards > > > >> > > > >> > > > >> On 6/9/21 2:04 PM, Diego Ceccarelli (BLOOMBERG/ LONDON) wrote: > > > >>> I have never used fuzzy search but from the documentation it seems > > very > > > >> expensive, and if you do it on 10 terms and 1M documents it seems > very > > > very > > > >> very expensive. > > > >>> Are you using the default 'fuzzyness' parameter? (0.5) - It might > end > > > up > > > >> exploring a lot of documents, did you try to play with that > parameter? > > > >>> Have you tried to see how the performance change if you do not use > > > fuzzy > > > >> (just to see if is fuzzy the introduce the slow down)? > > > >>> Or what happens to performance if you do fuzzy with 1, 2, 5 terms > > > >> instead of 10? > > > >>> > > > >>> From: java-user@lucene.apache.org At: 06/09/21 18:56:31To: > > > >> java-user@lucene.apache.org, baris.ka...@oracle.com > > > >>> Subject: Re: Potential bug > > > >>> > > > >>> i cant reveal those details i am very sorry. but it is more than 1 > > > >> million. > > > >>> let me tell that i have a lot of code that processes results from > > > lucene > > > >>> but the bottle neck is lucene fuzzy search. > > > >>> > > > >>> Best regards > > > >>> > > > >>> > > > >>> On 6/9/21 1:53 PM, Diego Ceccarelli (BLOOMBERG/ LONDON) wrote: > > > >>>> How many documents do you have in the index? > > > >>>> and can you show an example of query? > > > >>>> > > > >>>> > > > >>>> From: java-user@lucene.apache.org At: 06/09/21 18:33:25To: > > > >>> java-user@lucene.apache.org, baris.ka...@oracle.com > > > >>>> Subject: Re: Potential bug > > > >>>> > > > >>>> i have only two fields one string the other is a number (stored as > > > >>>> string), i guess you cant go simpler than this. > > > >>>> > > > >>>> i retreieve the hits and my major bottleneck is lucene fuzzy > search. > > > >>>> > > > >>>> > > > >>>> i take each word from the string which is usually around at most > 10 > > > >> words > > > >>>> i build a fuzzy boolean query out of them. > > > >>>> > > > >>>> > > > >>>> simple query is like this 10 word query. > > > >>>> > > > >>>> > > > >>>> limit means i want to stop lucene search around 20 hits i dont > want > > > >>>> thousands of hits. > > > >>>> > > > >>>> > > > >>>> Best regards > > > >>>> > > > >>>> > > > >>>> On 6/9/21 1:25 PM, Diego Ceccarelli (BLOOMBERG/ LONDON) wrote: > > > >>>> > > > >>>>> Hi Baris, > > > >>>>> > > > >>>>>> what if the user needs to limit the search process? > > > >>>>> What do you mean by 'limit'? > > > >>>>> > > > >>>>>> there should be a way to speedup lucene then if this is not > > > possible, > > > >>>>>> since for some simple queries it takes half a second which is > too > > > >> long. > > > >>>>> What do you mean by 'simple' query? there might be multiple > reasons > > > >> behind > > > >>>> slowness of a query that are unrelated to the search (for example, > > if > > > >> you > > > >>>> retrieve many documents and for each document you are extracting > the > > > >> content > > > >>> of > > > >>>> many fields) - would you like to tell us a bit more about your use > > > case? > > > >>>>> Regards, > > > >>>>> Diego > > > >>>>> > > > >>>>> From: java-user@lucene.apache.org At: 06/09/21 18:18:01To: > > > >>>> java-user@lucene.apache.org > > > >>>>> Cc: baris.ka...@oracle.com > > > >>>>> Subject: Re: Potential bug > > > >>>>> > > > >>>>> Thanks Adrien, but the differences is too far apart. > > > >>>>> > > > >>>>> I think the algorithm needs to be revised. > > > >>>>> > > > >>>>> > > > >>>>> what if the user needs to limit the search process? > > > >>>>> > > > >>>>> that leaves no control. > > > >>>>> > > > >>>>> there should be a way to speedup lucene then if this is not > > possible, > > > >>>>> > > > >>>>> since for some simple queries it takes half a second which is too > > > long. > > > >>>>> > > > >>>>> Best regards > > > >>>>> > > > >>>>> > > > >>>>> On 6/9/21 1:13 PM, Adrien Grand wrote: > > > >>>>>> Hi Baris, > > > >>>>>> > > > >>>>>> totalhitsThreshold is actually a minimum threshold, not a > maximum > > > >> threshold. > > > >>>>>> The problem is that Lucene cannot directly identify the top > > matching > > > >>>>>> documents for a given query. The strategy it adopts is to start > > > >> collecting > > > >>>>>> hits naively in doc ID order and to progressively raise the bar > > > about > > > >> the > > > >>>>>> minimum score that is required for a hit to be competitive in > > order > > > >> to skip > > > >>>>>> non-competitive documents. So it's expected that Lucene still > > > >> collects 100s > > > >>>>>> or 1000s of hits, even though the collector is configured to > only > > > >> compute > > > >>>>>> the top 10 hits. > > > >>>>>> > > > >>>>>> On Wed, Jun 9, 2021 at 7:07 PM <baris.ka...@oracle.com> wrote: > > > >>>>>> > > > >>>>>>> Hi,- > > > >>>>>>> > > > >>>>>>> i think this is a potential bug > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> i set this time totalHitsThreshold to 10 and i get totalhits > > > >> reported as > > > >>>>>>> 1655 but i get 10 results in total. > > > >>>>>>> > > > >>>>>>> I think this suggests that there might be a bug with > > > >>>>>>> TopScoreDocCollector algorithm. > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> Best regards > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > --------------------------------------------------------------------- > > > >>>>>>> To unsubscribe, e-mail: > java-user-unsubscr...@lucene.apache.org > > > >>>>>>> For additional commands, e-mail: > > java-user-h...@lucene.apache.org > > > >>>>>>> > > > >>>>>>> > > > >>>>> > > --------------------------------------------------------------------- > > > >>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > >>>>> For additional commands, e-mail: > java-user-h...@lucene.apache.org > > > >>>>> > > > >>>>> > > > >>>> > > --------------------------------------------------------------------- > > > >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org > > > >>>> > > > >>>> > > > >>> > --------------------------------------------------------------------- > > > >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > >>> For additional commands, e-mail: java-user-h...@lucene.apache.org > > > >>> > > > >>> > > > >> > --------------------------------------------------------------------- > > > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > > > >> > > > >> > > > > > > > > -- > Adrien >