M. Steiger created LANG-1199:
--------------------------------

             Summary: Incorrect implementation of 
StringUtils.getJaroWinklerDistance()
                 Key: LANG-1199
                 URL: https://issues.apache.org/jira/browse/LANG-1199
             Project: Commons Lang
          Issue Type: Bug
          Components: lang.*
    Affects Versions: 3.4
            Reporter: M. Steiger


The current implementation of StringUtils.getJaroWinklerDistance() does not 
compute the correct result in some cases. See #LANG-944 for the initial code 
contribution.

StringUtils.getJaroWinklerDistance("Haus Ingeborg", "Ingeborg Esser") == 0.0

This is due to the incorrect computation of common characters, which causes the 
algorithm to exit prematurely.

In contrast, the implementation in Lucene gives ~0.63, which is about right.

    JaroWinklerDistance d = new JaroWinklerDistance();
    getDistance("Haus Ingeborg", "Ingeborg Esser");

See 
https://lucene.apache.org/core/3_0_3/api/contrib-spellchecker/org/apache/lucene/search/spell/JaroWinklerDistance.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to