You're looking for https://en.m.wikipedia.org/wiki/Levenshtein_distance , there's a python module that implements this already (actually an extension, for speed)
On Sun, Sep 15, 2019, 16:26 Alexander Todorov <atodo...@mrsenko.com> wrote: > Hi folks, > I am looking for some tool (or algorithm which I can implement at the > worst) > which calculates similarities between strings. I would turn this into a > pylint > plugin b/c this is how I would consume it in my projects. > > My background is that we've identified *duplicate* or *similar* strings in > our > project which are marked for translation. Some of these are upper case vs. > lower > case and all of the variations between (I can lower case everything before > sending to the tool of course), variances in spelling, e.g. "test case" vs > "TestCase", variations into how certain words/combination of words are > used > together in a sentence, e.g. "user does not exist" vs. "the user specified > was > not found". > > Ideally I'd like to consume this tool in CI and based on the results > reduce the > number of source strings needed for translation and make life for > translators > easier. > > Feel free to propose anything, I have not done any research on this topic. > > > Thanks, > Alex > _______________________________________________ > code-quality mailing list -- code-quality@python.org > To unsubscribe send an email to code-quality-le...@python.org > https://mail.python.org/mailman3/lists/code-quality.python.org/ >
_______________________________________________ code-quality mailing list -- code-quality@python.org To unsubscribe send an email to code-quality-le...@python.org https://mail.python.org/mailman3/lists/code-quality.python.org/