Yes, i did those and i believe i am at the best level of performance now
and it is not bad at all but i want to make it much better.
i see like a linear drop in timings when i go lower number of words but
let me do that quick study again.
Fuzzy search is always expensive but that seems to suit best to my needs.
Thanks Diego for these great questions and i already explored them. But
thanks again.
Best regards
On 6/9/21 2:04 PM, Diego Ceccarelli (BLOOMBERG/ LONDON) wrote:
I have never used fuzzy search but from the documentation it seems very
expensive, and if you do it on 10 terms and 1M documents it seems very very
very expensive.
Are you using the default 'fuzzyness' parameter? (0.5) - It might end up
exploring a lot of documents, did you try to play with that parameter?
Have you tried to see how the performance change if you do not use fuzzy (just
to see if is fuzzy the introduce the slow down)?
Or what happens to performance if you do fuzzy with 1, 2, 5 terms instead of 10?
From: java-user@lucene.apache.org At: 06/09/21 18:56:31To:
java-user@lucene.apache.org, baris.ka...@oracle.com
Subject: Re: Potential bug
i cant reveal those details i am very sorry. but it is more than 1 million.
let me tell that i have a lot of code that processes results from lucene
but the bottle neck is lucene fuzzy search.
Best regards
On 6/9/21 1:53 PM, Diego Ceccarelli (BLOOMBERG/ LONDON) wrote:
How many documents do you have in the index?
and can you show an example of query?
From: java-user@lucene.apache.org At: 06/09/21 18:33:25To:
java-user@lucene.apache.org, baris.ka...@oracle.com
Subject: Re: Potential bug
i have only two fields one string the other is a number (stored as
string), i guess you cant go simpler than this.
i retreieve the hits and my major bottleneck is lucene fuzzy search.
i take each word from the string which is usually around at most 10 words
i build a fuzzy boolean query out of them.
simple query is like this 10 word query.
limit means i want to stop lucene search around 20 hits i dont want
thousands of hits.
Best regards
On 6/9/21 1:25 PM, Diego Ceccarelli (BLOOMBERG/ LONDON) wrote:
Hi Baris,
what if the user needs to limit the search process?
What do you mean by 'limit'?
there should be a way to speedup lucene then if this is not possible,
since for some simple queries it takes half a second which is too long.
What do you mean by 'simple' query? there might be multiple reasons behind
slowness of a query that are unrelated to the search (for example, if you
retrieve many documents and for each document you are extracting the content
of
many fields) - would you like to tell us a bit more about your use case?
Regards,
Diego
From: java-user@lucene.apache.org At: 06/09/21 18:18:01To:
java-user@lucene.apache.org
Cc: baris.ka...@oracle.com
Subject: Re: Potential bug
Thanks Adrien, but the differences is too far apart.
I think the algorithm needs to be revised.
what if the user needs to limit the search process?
that leaves no control.
there should be a way to speedup lucene then if this is not possible,
since for some simple queries it takes half a second which is too long.
Best regards
On 6/9/21 1:13 PM, Adrien Grand wrote:
Hi Baris,
totalhitsThreshold is actually a minimum threshold, not a maximum threshold.
The problem is that Lucene cannot directly identify the top matching
documents for a given query. The strategy it adopts is to start collecting
hits naively in doc ID order and to progressively raise the bar about the
minimum score that is required for a hit to be competitive in order to skip
non-competitive documents. So it's expected that Lucene still collects 100s
or 1000s of hits, even though the collector is configured to only compute
the top 10 hits.
On Wed, Jun 9, 2021 at 7:07 PM <baris.ka...@oracle.com> wrote:
Hi,-
i think this is a potential bug
i set this time totalHitsThreshold to 10 and i get totalhits reported as
1655 but i get 10 results in total.
I think this suggests that there might be a bug with
TopScoreDocCollector algorithm.
Best regards
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org