Re: Getting irrelevant results using fuzzy query

mark harwood Mon, 23 Jun 2008 04:30:27 -0700

>>I do have serious problems with the relevance of the results with fuzzy 
>>queries.


Please take the time to read my response here:

     http://www.gossamer-threads.com/lists/lucene/java-user/62050#62050

I had a work colleague come up with exactly the same problem this week and the 
solution is the same.

Just tested my index with a standard Lucene FuzzyQuery for "Paul~" - this gives 
"Phul", "Saul", and "Paulo" before ANY "Paul" records due to IDF issues.
Using FuzzyLikeThisQuery puts all the "Paul" records ahead of the variants.



----- Original Message ----
From: László Monda <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Cc: [EMAIL PROTECTED]
Sent: Monday, 23 June, 2008 12:10:05 PM
Subject: Re: Getting irrelevant results using fuzzy query

On Wed, 2008-06-18 at 21:10 +0200, Daniel Naber wrote:
> On Mittwoch, 18. Juni 2008, László Monda wrote:
> 
> > Additional info: Lucene seems to do the right thing when only few
> > documents are present, but goes crazy when there is about 1.5 million
> > documents in the index.
> 
> Lucene works well with more documents (currently using it with 9 million). 
> but the fuzzy query requires iteration over all terms which makes this 
> query slow. This can be avoid by setting the prefixLength parameter of the 
> FuzzyQuery constructor to 1 or 2. Or maybe you should use an n-gram index, 
> see the spellchecker in the contrib area.

Thanks for the suggestion, but I don't have any performance problems
yet, but I do have serious problems with the relevance of the results
with fuzzy queries.

-- 
Laci  <http://monda.hu>


      __________________________________________________________
Sent from Yahoo! Mail.
A Smarter Email http://uk.docs.yahoo.com/nowyoucan.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Getting irrelevant results using fuzzy query

Reply via email to