While your at it, you might consider not allowing variation in case and dash vs. underscore when specifying a dependency. A project should have only one concrete name, without fuzziness. A fuzzy match should result in a match failure. Fuzzy matches for a manual search is a different thing.
On Wed, May 15, 2013 at 9:31 AM, Daniel Holth <dho...@gmail.com> wrote: > How to avoid confusables. > > These scripts are recommended for use in identifiers: > http://www.unicode.org/reports/tr31/#Table_Recommended_Scripts > > This report details a confusables detection algorithm: > http://www.unicode.org/reports/tr39/#Confusable_Detection > > And ICU implements it: > http://www.icu-project.org/apiref/icu4c/uspoof_8h.html (see also > PyICU). > > The package index would enforce uniqueness of the "skeleton" of each > registered package which is just an internal normalization based on > confusability. if skeleton(identifier1) == skeleton(identifier2) then > id1 and id2 are confusable. > > The tooling could get away with a simpler rule like > re.sub("[^\w\d.]+", "_", distribution, re.UNICODE) > > As a bonus to including the world, this should be able to prevent > people from exchanging zeroes for capital O. > > On Wed, May 15, 2013 at 7:17 AM, Eric V. Smith <e...@trueblade.com> wrote: > > On 05/15/2013 07:10 AM, Donald Stufft wrote: > >>>>> Anyone want to run a scan over the PyPI package set to see > >>>>> how many packages would cause problems for a "[a-zA-Z0-9_.-]" > >>>>> only filter? > >>>> > >>>> See my previous email where I did queries against my local DB. > >>>> It's 225 total projects that wouldn't be allowed. > >>> > >>> Can you send the list of those projects? > >>> > >>> Eric. > >>> > >> > >> Here you go https://gist.github.com/dstufft/5583225 used a Python > >> oneliner and the PyPI API so others can reproduce easily if they > >> wish. > > > > Perfect. Thanks. > > > > It looks like space causes most of the issues. I'm not sure how > > "Twisted Flow >= 1.0" would be expected to parse. > > > > Eric. > > > > > > _______________________________________________ > > Distutils-SIG maillist - Distutils-SIG@python.org > > http://mail.python.org/mailman/listinfo/distutils-sig > _______________________________________________ > Distutils-SIG maillist - Distutils-SIG@python.org > http://mail.python.org/mailman/listinfo/distutils-sig >
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig