[
https://issues.apache.org/jira/browse/LANG-944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rekha Joshi updated LANG-944:
-----------------------------
Attachment: LANG-944.1.patch
Thanks Benedikt.
Currently StringUtils has LevenshteinDistance for string distance, but in
multiple cases we need a similarity score.
Attached patch has jaro winkler similarity implementation.Higher score shows
more similarity, and is much helpful in data mining.
StringUtils.getLevenshteinDistance("PENNSYLVANIA", "PENCILVANYA") = 4; which
does not give clearly the similarity ratio.
Now, StringUtils.getSimilarityScore("PENNSYLVANIA", "PENCILVANYA") = 0.87.
> Add a feature of SimilarityMatch in StringUtils
> ------------------------------------------------
>
> Key: LANG-944
> URL: https://issues.apache.org/jira/browse/LANG-944
> Project: Commons Lang
> Issue Type: New Feature
> Components: lang.*
> Affects Versions: 3.3
> Reporter: Rekha Joshi
> Fix For: Patch Needed, Discussion
>
> Attachments: LANG-944.1.patch
>
>
> Add SimilarityMatch algorithm to evaluate a similarity matching ratio between
> two strings.
> double matchscore = StringUtils.calculateSimilarityMatching(String s1, String
> s2)
> I have a patch ready with implementation of similaritymatch.
> This happens to be a usual need in science algorithm and directly using
> commons lang3 library for these string operation would be neat.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)