DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=32942>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=32942

           Summary: Fuzzy query scoring issues
           Product: Lucene
           Version: 1.2rc5
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Search
        AssignedTo: lucene-dev@jakarta.apache.org
        ReportedBy: [EMAIL PROTECTED]


Queries which automatically produce multiple terms (wildcard, range, prefix, 
fuzzy etc)currently suffer from two problems:

1) Scores for matching documents are significantly smaller than term queries 
because of the volume of terms introduced (A match on query Foo~ is 0.1 
whereas a match on query Foo is 1).
2) The rarer forms of expanded terms are favoured over those of more common 
forms because of the IDF. When using Fuzzy queries for example, rare mis-
spellings typically appear in results before the more common correct spellings.


I will attach a patch that corrects the issues identified above by 
1) Overriding Similarity.coord to counteract the downplaying of scores 
introduced by expanding terms.
2) Taking the IDF factor of the most common form of expanded terms as the 
basis of scoring all other expanded terms.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to