Hello,
I'm using Lucene v3.
Please consider the following spellings
Lucene
Lucéne
lucéne
Lucane
Lucen
When searching for "lucéne" among those words using a FuzzyQuery (with 0.5
edit distance), results show :
1. Lucene 1.0259752
2. Lucane 1.0259752
3. Lucéne 0.95660806
4. lucéne 0.95660806
5. Lucen 0.30779266
#4 is an exact match, why does it receive a lower score than "Lucane" which
contains one incorrect letter?
Also, if you raise min similarity a bit higher (0.6 of above), everything
becomes normal :
1. Lucéne 1.0438477
2. lucéne 1.0438477
3. Lucene 0.97959816
4. Lucane 0.97959816
Any idea?
Thanks in advance...
The code I use :
/**
* @param args the command line arguments
*/
public static void main(String[] args) throws IOException,
ParseException
{
StandardAnalyzer analyzer = new
StandardAnalyzer(Version.LUCENE_CURRENT);
// TODO code application logic here
Directory index = new RAMDirectory();
IndexWriter w = new IndexWriter(index, analyzer, true,
IndexWriter.MaxFieldLength.UNLIMITED);
addDoc(w, "Lucene");
addDoc(w, "Lucéne");
addDoc(w, "lucéne");
addDoc(w, "Lucane");
addDoc(w, "Lucen");
w.close();
FuzzyQuery q = new FuzzyQuery( new Term("title", "lucéne") , 0.5f
);
// 3. search
IndexSearcher searcher = new IndexSearcher(index);
TopDocs collector = searcher.search(q, 10);
ScoreDoc[] hits = collector.scoreDocs;
// 4. display results
System.out.println("Found " + hits.length + " hits.");
for(int i = 0 ; i < hits.length; i++)
{
Document d = searcher.doc(hits[i].doc);
System.out.println((i + 1) + ". " + d.get("title") + " " +
hits[i].score );
}
// searcher can only be closed when there
// is no need to access the documents any more.
searcher.close();
}
private static void addDoc(IndexWriter w, String value) throws
IOException
{
Document doc = new Document();
doc.add(new Field("title", value, Field.Store.YES,
Field.Index.ANALYZED));
w.addDocument(doc);
}
--
View this message in context:
http://old.nabble.com/Strange-Fuzzyquery-results-scoring-when-using-a-low-minimal-distance-tp27594371p27594371.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]