[ 
https://issues.apache.org/jira/browse/LUCENE-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892341#action_12892341
 ] 

Eks Dev commented on LUCENE-2557:
---------------------------------

It looks like we have one invariant:
IDF(QueryTerm) >= IDF(Expansion Term) // Preventing better scoring documents 
with ET then Documents with exact match on QT.

Fixing all expansions to IDF(QT) would remove dynamics of the score, making the 
contribution to the score  for all expansions identical. Maybe proportionally 
scaling IDF of all expansions  to preserve mutual IDF dynamics, (relative to 
IDF(QT) to keep-up with invariant)  would work better?

In case when there is no matching QueryTerm, why not simply preserving 
expansion Term IDF, what is averaging good for, performance?

> FuzzyQuery - fuzzy terms and misspellings are ranked higher than exact matches
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-2557
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2557
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 3.0.2
>            Reporter: Jingkei Ly
>         Attachments: idf-scoring-test-case.patch, LUCENE-2557.patch
>
>
> The FuzzyQuery often causes misspellings to be ranked higher than the exact 
> match, which seems to be an undesirable property generally. 
> For example, in an index of surnames, if I search using a FuzzyQuery for 
> "smith", the misspellings such as "smiith", or "smiht" would appear near the 
> top of the search results ahead of documents that match "smith".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to