[ https://issues.apache.org/jira/browse/LUCENE-124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833885#action_12833885 ]
Robert Muir commented on LUCENE-124: ------------------------------------ uwe pointed out to me, i think there is a naming problem with TOP_TERMS_CONSTANT_BOOLEAN_REWRITE, as the entire booleanquery will not produce the same score like CONSTANT_SCORE_BOOLEAN_QUERY_REWRITE. I think the behavior makes sense though, as it wouldnt make sense to use TOP_TERMS without per-term boosting, but we need to fix the naming... and TOP_TERMS_BOOST_BOOLEAN_REWRITE sounds confusing. > Fuzzy Searches do not get a boost of 0.2 as stated in "Query Syntax" doc > ------------------------------------------------------------------------ > > Key: LUCENE-124 > URL: https://issues.apache.org/jira/browse/LUCENE-124 > Project: Lucene - Java > Issue Type: Bug > Components: Search > Affects Versions: 1.2 > Environment: Operating System: All > Platform: All > Reporter: Cormac Twomey > Assignee: Robert Muir > Priority: Minor > Attachments: LUCENE-124.patch > > > According to the website's "Query Syntax" page, fuzzy searches are given a > boost of 0.2. I've found this not to be the case, and have seen situations > where > exact matches have lower relevance scores than fuzzy matches. > Rather than getting a boost of 0.2, it appears that all variations on the term > are first found in the model, where dist* > 0.5. > * dist = levenshteinDistance / length of min(termlength, variantlength) > This then leads to a boolean OR search of all the variant terms, each of whose > boost is set to (dist - 0.5)*2 for that variant. > The upshot of all of this is that there are many cases where a fuzzy match > will > get a higher relevance score than an exact match. > See this email for a test case to reproduce this anomalous behaviour. > http://www.mail-archive.com/lucene-...@jakarta.apache.org/msg02819.html > Here is a candidate patch to address the issue - > *** lucene-1.2\src\java\org\apache\lucene\search\FuzzyTermEnum.java Sun Jun > 09 > 13:47:54 2002 > --- lucene-1.2-modified\src\java\org\apache\lucene\search\FuzzyTermEnum.java > Fri > Mar 14 11:37:20 2003 > *************** > *** 99,105 **** > } > > final protected float difference() { > ! return (float)((distance - FUZZY_THRESHOLD) * SCALE_FACTOR); > } > > final public boolean endEnum() { > --- 99,109 ---- > } > > final protected float difference() { > ! if (distance == 1.0) { > ! return 1.0f; > ! } > ! else > ! return (float)((distance - FUZZY_THRESHOLD) * > SCALE_FACTOR); > } > > final public boolean endEnum() { > *************** > *** 111,117 **** > ******************************/ > > public static final double FUZZY_THRESHOLD = 0.5; > ! public static final double SCALE_FACTOR = 1.0f / (1.0f - > FUZZY_THRESHOLD); > > /** > Finds and returns the smallest of three integers > --- 115,121 ---- > ******************************/ > > public static final double FUZZY_THRESHOLD = 0.5; > ! public static final double SCALE_FACTOR = 0.2f * (1.0f / (1.0f - > FUZZY_THRESHOLD)); > > /** > Finds and returns the smallest of three integers -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org