> Would anyone know of any prior art for detection of "short edit distances"? > (Perhaps even already on CPAN?)
As David & Zefram pointed out, Levenshtein is the classic algorithm for this, but there are plenty of others; in the SEE ALSO for Text::Levenshtein I’ve listed at least some of the ones I know of on CPAN: https://metacpan.org/pod/Text::Levenshtein#SEE-ALSO A better algorithm for this purpose is the Damerau-Levenshtein edit distance: Classic Levenshtein counts the number of insertions, deletions, and substitutions needed to get from one string to the other. Comparing "Algorithm::SVM" and "Algorithm::VSM” gives an edit distance of 2. The Damerau variant adds transpositions of adjacent characters. This results in an edit distance of 1 for the example above, which is how my script found it. I used Text::Levenshtein::Damerau::XS, because it’s quicker. That’s how I found the examples I gave yesterday. I’ll tweak my script to not worry about packages in the same distribution (eg Acme::Flat::GV and Acme::Flat::HV). Then I just need to get a list of new packages each day, and I’m just about there :-) Neil