You're looking for https://en.m.wikipedia.org/wiki/Levenshtein_distance ,
there's a python module that implements this already (actually an
extension, for speed)

On Sun, Sep 15, 2019, 16:26 Alexander Todorov <atodo...@mrsenko.com> wrote:

> Hi folks,
> I am looking for some tool (or algorithm which I can implement at the
> worst)
> which calculates similarities between strings. I would turn this into a
> pylint
> plugin b/c this is how I would consume it in my projects.
>
> My background is that we've identified *duplicate* or *similar* strings in
> our
> project which are marked for translation. Some of these are upper case vs.
> lower
> case and all of the variations between (I can lower case everything before
> sending to the tool of course), variances in spelling, e.g. "test case" vs
> "TestCase", variations into how certain words/combination of words are
> used
> together in a sentence, e.g. "user does not exist" vs. "the user specified
> was
> not found".
>
> Ideally I'd like to consume this tool in CI and based on the results
> reduce the
> number of source strings needed for translation and make life for
> translators
> easier.
>
> Feel free to propose anything, I have not done any research on this topic.
>
>
> Thanks,
> Alex
> _______________________________________________
> code-quality mailing list -- code-quality@python.org
> To unsubscribe send an email to code-quality-le...@python.org
> https://mail.python.org/mailman3/lists/code-quality.python.org/
>
_______________________________________________
code-quality mailing list -- code-quality@python.org
To unsubscribe send an email to code-quality-le...@python.org
https://mail.python.org/mailman3/lists/code-quality.python.org/

Reply via email to