[ 
http://issues.apache.org/jira/browse/LUCENE-124?page=comments#action_12330223 ] 

Mark Harwood commented on LUCENE-124:
-------------------------------------

I would suggest this is a duplicate of 
http://issues.apache.org/jira/browse/LUCENE-329

The idf rating of expanded terms should be the same and not favour rarer terms. 
I suggest that this applies to all auto-expanding searches eg range queries.

Should we drop this bug as a duplicate?

> Fuzzy Searches do not get a boost of 0.2 as stated in "Query Syntax" doc
> ------------------------------------------------------------------------
>
>          Key: LUCENE-124
>          URL: http://issues.apache.org/jira/browse/LUCENE-124
>      Project: Lucene - Java
>         Type: Bug
>   Components: Search
>     Versions: 1.2
>  Environment: Operating System: All
> Platform: All
>     Reporter: Cormac Twomey
>     Assignee: Lucene Developers

>
> According to the website's "Query Syntax" page, fuzzy searches are given a
> boost of 0.2. I've found this not to be the case, and have seen situations 
> where
> exact matches have lower relevance scores than fuzzy matches.
> Rather than getting a boost of 0.2, it appears that all variations on the term
> are first found in the model, where dist* > 0.5.
> * dist = levenshteinDistance / length of min(termlength, variantlength)
> This then leads to a boolean OR search of all the variant terms, each of whose
> boost is set to (dist - 0.5)*2 for that variant.
> The upshot of all of this is that there are many cases where a fuzzy match 
> will
> get a higher relevance score than an exact match.
> See this email for a test case to reproduce this anomalous behaviour.
> http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg02819.html
> Here is a candidate patch to address the issue -
> *** lucene-1.2\src\java\org\apache\lucene\search\FuzzyTermEnum.java   Sun Jun 
> 09
> 13:47:54 2002
> --- lucene-1.2-modified\src\java\org\apache\lucene\search\FuzzyTermEnum.java  
> Fri
> Mar 14 11:37:20 2003
> ***************
> *** 99,105 ****
>       }
>       
>       final protected float difference() {
> !         return (float)((distance - FUZZY_THRESHOLD) * SCALE_FACTOR);
>       }
>       
>       final public boolean endEnum() {
> --- 99,109 ----
>       }
>       
>       final protected float difference() {
> !             if (distance == 1.0) {
> !                     return 1.0f;
> !             }
> !             else
> !                     return (float)((distance - FUZZY_THRESHOLD) * 
> SCALE_FACTOR);
>       }
>       
>       final public boolean endEnum() {
> ***************
> *** 111,117 ****
>        ******************************/
>       
>       public static final double FUZZY_THRESHOLD = 0.5;
> !     public static final double SCALE_FACTOR = 1.0f / (1.0f - 
> FUZZY_THRESHOLD);
>       
>       /**
>        Finds and returns the smallest of three integers 
> --- 115,121 ----
>        ******************************/
>       
>       public static final double FUZZY_THRESHOLD = 0.5;
> !     public static final double SCALE_FACTOR = 0.2f * (1.0f / (1.0f -
> FUZZY_THRESHOLD));
>       
>       /**
>        Finds and returns the smallest of three integers

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to