Hi Daniel,
On Tue, 2008-06-17 at 20:38 +0200, Daniel Naber wrote:
> On Dienstag, 17. Juni 2008, László Monda wrote:
>
> > FuzzyQuery artist_query = new FuzzyQuery(new Term("artist",
> > artist));
>
> You should try the FuzzyQuery constructor that takes a minimum similarity
> and a prefix length. The general problem is however, that the degree of
> similarity is only one factor. The other factors are the same as for other
> searches, e.g. the number of occurences of the term in the document and in
> the whole index.
>
> You could try to write your own similarity implementation that disables all
> these factors, see
> http://lucene.apache.org/java/2_3_1/api/org/apache/lucene/search/Similarity.html
> I understand some essential concepts related to Lucene such as the Levenshtein distance and tokenization, but I really don't want to go this deep if it's not necessary. Since fuzzy searching is based on the Levenshtein distance, the distance between "coldplay" and "coldplay" is 0 and the distance between "coldplay" and "downplay" is 3 so how on earth is possible that when searching for "coldplay", Lucene returns "longplay"? This shouldn't happen regardless of the minimum similarity and prefix length factors. Additional info: Lucene seems to do the right thing when only few documents are present, but goes crazy when there is about 1.5 million documents in the index. > BTW, In general, there's more traffic on the java-user list and you might > get more answers there. Thanks for the suggestion, I might try java-user later. > Regards > Daniel > -- Laci <http://monda.hu>
signature.asc
Description: This is a digitally signed message part
