frequently-used algorithms for string edit distance:

Levenshtein & Damerau Levenshtein distance
Jaro & Jaro-Winkler distance
N-Gram distance

-- 
Rain Chen
Sent with Airmail

On September 16, 2019 at 4:26:53 AM, Todorov Alexander (atodo...@mrsenko.com)
wrote:

Hi folks,
I am looking for some tool (or algorithm which I can implement at the
worst)
which calculates similarities between strings. I would turn this into a
pylint
plugin b/c this is how I would consume it in my projects.

My background is that we've identified *duplicate* or *similar* strings in
our
project which are marked for translation. Some of these are upper case vs.
lower
case and all of the variations between (I can lower case everything before
sending to the tool of course), variances in spelling, e.g. "test case" vs
"TestCase", variations into how certain words/combination of words are used
together in a sentence, e.g. "user does not exist" vs. "the user specified
was
not found".

Ideally I'd like to consume this tool in CI and based on the results reduce
the
number of source strings needed for translation and make life for
translators
easier.

Feel free to propose anything, I have not done any research on this topic.


Thanks,
Alex
_______________________________________________
code-quality mailing list -- code-quality@python.org
To unsubscribe send an email to code-quality-le...@python.org
https://mail.python.org/mailman3/lists/code-quality.python.org/
_______________________________________________
code-quality mailing list -- code-quality@python.org
To unsubscribe send an email to code-quality-le...@python.org
https://mail.python.org/mailman3/lists/code-quality.python.org/

Reply via email to