[
https://issues.apache.org/jira/browse/LANG-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13876570#comment-13876570
]
Benedikt Ritter commented on LANG-944:
--------------------------------------
Hello Rekha,
the second patch looks promising! The new tests look very good. I only have a
few suggestions, regarding code symmetry:
* I've made the suggestion to rename the method to {{getJaroWinklerScore()}}.
In the meantime I've learned that Levenshtein algorithm and Jaro Winkler
algorithm are just two different flavors of calculating the distance between
two strings. Since we have called the other method getLevenshtein *Distance*
(), it would be better if we call your method getJaroWinkler *Distance* ().
* For the Levenshtein Distance there is a variation of the method that takes
an additional threshold parameter. I've told you, that I'm no expert when it
comes to string algorithms. Can the Jaro Winkler algorithm also be customized
by an optional threshold parameter? Then it would be nice if you could provide
such a variation of the method.
* getLevenshteinDistance("", "") returns 0 but getJaroWinklerScore("", "")
throws IllegalArgumentException. Is this by definition of the algorithm or can
we change your implementation so that Jaro Winkler behaves the same for empty
strings?
We would then have:
{code:java}
int getLevenshteinDistance(CharSequence, CharSequence)
int getLevenshteinDistance(CharSequence, CharSequence, int)
int getJaroWinklerDistance(CharSequence, CharSequence)
int getJaroWinklerDistance(CharSequence, CharSequence, int)
{code}
This looks sound to me. WDYT? Would you be so kind to improve you patch one
last time?
TIA!
Benedikt
P.S.: I'd like to draw your attention to the
[ICLA|http://www.apache.org/licenses] once more. It would be good if you would
file one. It just takes 5 minutes and makes things a lot easier.
> Add a feature of SimilarityMatch in StringUtils
> ------------------------------------------------
>
> Key: LANG-944
> URL: https://issues.apache.org/jira/browse/LANG-944
> Project: Commons Lang
> Issue Type: New Feature
> Components: lang.*
> Reporter: Rekha Joshi
> Fix For: 3.3, Review Patch, Discussion
>
> Attachments: LANG-944.1.patch, LANG-944.2.patch
>
>
> Add SimilarityMatch algorithm to evaluate a similarity matching ratio between
> two strings.
> double matchscore = StringUtils.calculateSimilarityMatching(String s1, String
> s2)
> I have a patch ready with implementation of similaritymatch.
> This happens to be a usual need in science algorithm and directly using
> commons lang3 library for these string operation would be neat.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)