How many documents do you have in the index? 
and can you show an example of query? 


From: java-user@lucene.apache.org At: 06/09/21 18:33:25To:  
java-user@lucene.apache.org,  baris.ka...@oracle.com
Subject: Re: Potential bug

i have only two fields one string the other is a number (stored as 
string), i guess you cant go simpler than this.

i retreieve the hits and my major bottleneck is lucene fuzzy search.


i take each word from the string which is usually around at most 10 words

i build a fuzzy boolean query out of them.


simple query is like this 10 word query.


limit means i want to stop lucene search around 20 hits i dont want 
thousands of hits.


Best regards


On 6/9/21 1:25 PM, Diego Ceccarelli (BLOOMBERG/ LONDON) wrote:

> Hi Baris,
>
>> what if the user needs to limit the search process?
> What do you mean by 'limit'?
>
>> there should be a way to speedup lucene then if this is not possible,
>> since for some simple queries it takes half a second which is too long.
> What do you mean by 'simple' query? there might be multiple reasons behind 
slowness of a query that are unrelated to the search (for example, if you 
retrieve many documents and for each document you are extracting the content of 
many fields) - would you like to tell us a bit more about your use case?
>
> Regards,
> Diego
>
> From: java-user@lucene.apache.org At: 06/09/21 18:18:01To:  
java-user@lucene.apache.org
> Cc:  baris.ka...@oracle.com
> Subject: Re: Potential bug
>
> Thanks Adrien, but the differences is too far apart.
>
> I think the algorithm needs to be revised.
>
>
> what if the user needs to limit the search process?
>
> that leaves no control.
>
> there should be a way to speedup lucene then if this is not possible,
>
> since for some simple queries it takes half a second which is too long.
>
> Best regards
>
>
> On 6/9/21 1:13 PM, Adrien Grand wrote:
>> Hi Baris,
>>
>> totalhitsThreshold is actually a minimum threshold, not a maximum threshold.
>>
>> The problem is that Lucene cannot directly identify the top matching
>> documents for a given query. The strategy it adopts is to start collecting
>> hits naively in doc ID order and to progressively raise the bar about the
>> minimum score that is required for a hit to be competitive in order to skip
>> non-competitive documents. So it's expected that Lucene still collects 100s
>> or 1000s of hits, even though the collector is configured to only compute
>> the top 10 hits.
>>
>> On Wed, Jun 9, 2021 at 7:07 PM <baris.ka...@oracle.com> wrote:
>>
>>> Hi,-
>>>
>>>     i think this is a potential bug
>>>
>>>
>>> i set this time totalHitsThreshold to 10 and i get totalhits reported as
>>> 1655 but i get 10 results in total.
>>>
>>> I think this suggests that there might be a bug with
>>> TopScoreDocCollector algorithm.
>>>
>>>
>>> Best regards
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>
>>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


Reply via email to