[ 
https://issues.apache.org/jira/browse/LANG-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13876570#comment-13876570
 ] 

Benedikt Ritter commented on LANG-944:
--------------------------------------

Hello Rekha,

the second patch looks promising! The new tests look very good. I only have a 
few suggestions, regarding code symmetry:

 * I've made the suggestion to rename the method to {{getJaroWinklerScore()}}. 
In the meantime I've learned that Levenshtein algorithm and Jaro Winkler 
algorithm are just two different flavors of calculating the distance between 
two strings. Since we have called the other method getLevenshtein *Distance* 
(), it would be better if we call your method getJaroWinkler *Distance* ().
 * For the Levenshtein Distance there is a variation of the method that takes 
an additional threshold parameter. I've told you, that I'm no expert when it 
comes to string algorithms. Can the Jaro Winkler algorithm also be customized 
by an optional threshold parameter? Then it would be nice if you could provide 
such a variation of the method.
 * getLevenshteinDistance("", "") returns 0 but getJaroWinklerScore("", "") 
throws IllegalArgumentException. Is this by definition of the algorithm or can 
we change your implementation so that Jaro Winkler behaves the same for empty 
strings?

We would then have:
{code:java}
int getLevenshteinDistance(CharSequence, CharSequence)
int getLevenshteinDistance(CharSequence, CharSequence, int)
int getJaroWinklerDistance(CharSequence, CharSequence)
int getJaroWinklerDistance(CharSequence, CharSequence, int)
{code}

This looks sound to me. WDYT? Would you be so kind to improve you patch one 
last time?

TIA!
Benedikt

P.S.: I'd like to draw your attention to the 
[ICLA|http://www.apache.org/licenses] once more. It would be good if you would 
file one. It just takes 5 minutes and makes things a lot easier.

> Add a feature of SimilarityMatch in StringUtils 
> ------------------------------------------------
>
>                 Key: LANG-944
>                 URL: https://issues.apache.org/jira/browse/LANG-944
>             Project: Commons Lang
>          Issue Type: New Feature
>          Components: lang.*
>            Reporter: Rekha Joshi
>             Fix For: 3.3, Review Patch, Discussion
>
>         Attachments: LANG-944.1.patch, LANG-944.2.patch
>
>
> Add SimilarityMatch algorithm to evaluate a similarity matching ratio between 
> two strings.
> double matchscore = StringUtils.calculateSimilarityMatching(String s1, String 
> s2)
> I have a patch ready with implementation of similaritymatch.
> This happens to be a usual need in science algorithm and directly using 
> commons lang3 library for these string operation would be neat.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to