Hi Daniel,

On Tue, 2008-06-17 at 20:38 +0200, Daniel Naber wrote:
> On Dienstag, 17. Juni 2008, László Monda wrote:
> 
> >     FuzzyQuery artist_query = new FuzzyQuery(new Term("artist",
> > artist));
> 
> You should try the FuzzyQuery constructor that takes a minimum similarity 
> and a prefix length. The general problem is however, that the degree of 
> similarity is only one factor. The other factors are the same as for other 
> searches, e.g. the number of occurences of the term in the document and in 
> the whole index.
> 
> You could try to write your own similarity implementation that disables all 
> these factors, see
> http://lucene.apache.org/java/2_3_1/api/org/apache/lucene/search/Similarity.html
>  

I understand some essential concepts related to Lucene such as the
Levenshtein distance and tokenization, but I really don't want to go
this deep if it's not necessary.

Since fuzzy searching is based on the Levenshtein distance, the distance
between "coldplay" and "coldplay" is 0 and the distance between
"coldplay" and "downplay" is 3 so how on earth is possible that when
searching for "coldplay", Lucene returns "longplay"?  This shouldn't
happen regardless of the minimum similarity and prefix length factors.

Additional info: Lucene seems to do the right thing when only few
documents are present, but goes crazy when there is about 1.5 million
documents in the index.

> BTW, In general, there's more traffic on the java-user list and you might 
> get more answers there.

Thanks for the suggestion, I might try java-user later.

> Regards
>  Daniel
> 
-- 
Laci  <http://monda.hu>

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to