[
https://issues.apache.org/jira/browse/TEXT-129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16567384#comment-16567384
]
Jan Martin Keil edited comment on TEXT-129 at 8/2/18 7:21 PM:
--------------------------------------------------------------
This is expected behavior. For example:
{code:java}
A = "donald trump"
B = "trump"
lengthA = 12
lengthB = 5
range = max(lengthA,lengthB)/2 -1 = 5
{code}
Therefore, the matching candidates are:
{code:java}
donald trump
t----- -> "donald" -> no t
-r----- -> "donald " -> no r
--u----- -> "donald t" -> no r
---m----- -> "donald tr" -> no m
----p----- -> "donald tru" -> no p
{code}
Therefore, there are no matching characters.
Therefore, the result is 0.
was (Author: jmkeil):
This is expected behavior. For example:
{code:java}
A = "donald trump"
B = "trump"
lengthA = 12
lengthB = 5
range = max(lengthA,lengthB)/2 -1 = 5
{code}
Therefore the matching candidates are:
{code:java}
donald trump
t----- -> "donald" -> no t
-r----- -> "donald " -> no r
--u----- -> "donald t" -> no r
---m----- -> "donald tr" -> no m
----p----- -> "donald tru" -> no p
{code}
Therefore there are no matching characters.
Therefore the result is 0.
> incorrect result from JaroWinklerDistance(计算不正确)
> ------------------------------------------------
>
> Key: TEXT-129
> URL: https://issues.apache.org/jira/browse/TEXT-129
> Project: Commons Text
> Issue Type: Bug
> Affects Versions: 1.4
> Environment: commons-lang3:3.7; HotSpot for Linux, 1.8;
> Reporter: Jason Lee
> Priority: Major
> Attachments: Screenshot from 2018-07-31 13-04-24.png
>
>
> JaroWinklerDistance resolves 0 similariy between "_trump_" and "_donald
> trump_"
> scala code here:
> scala> val jw=new JaroWinklerDistance
> scala> jw("*trump*","*donald trump*") // *INCORRECT*
> *res1: Double = 0.0*
> scala> jw("ivanka trump","donald trump") // correct
> res2: Double = 0.736111111111111
> scala> jw(" trump","trump") // correct result; there's a leading space in
> first string
> res13: Double = 0.9444444444444445
> scala> jw("a trump","trump") // correct
> res14: Double = 0.9047619047619048
> scala> jw("aa trump","trump") // correct
> res15: Double = 0.875
> scala> jw("aaa trump","trump") // *INCORRECT*
> res16: Double = 0.0
> scala> jw("hillary cliton","clinton") // correct
> res8: Double = 0.30952380952380953
> scala> jw("donald trump","trump") // INCORRECT
> res9: Double = 0.0
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)