On Fri, Nov 09, 2012 at 09:38:54PM -0500, Daniel Holth wrote: > Although I think the ~ is a very ugly -, it could be useful to change the > separator to something less commonly used than the -. > > It would be useful to be able to use the hyphen - in the version of a package > (for semver) and elsewhere. Using it as the separator could make parsing the > file name a bit trickier than is healthy. > items 10 and 11 of semver are problematic. Other people who consume versions, for instance Linux distributions, have a history of using dashes as a separator. They have to deal with stripping hyphens out of versions that make use them.
The fact that distutils/setuptools also treats hyphens as separators is a good thing for these audiences. [..] > > If we do this, I > would like to allow Unicode package names at the same time. safe_name(), the > pkg_resources function that escapes package names for file names, would become > > re.sub(u"[^\w.]+", "_", u"package-name", flags=re.U) > > > In other words, the rule for package names would be that they can contain any > Unicode alphanumeric or _ or dot. Right now package names cannot practically > contain non-ASCII because the setuptools installation will fold it all to _ > and > installation metadata will collide on the disk. > I consider the limitation of package names to non-ascii to be a blessing in disguise. In python3, unicode module names are possible but not portable between systems. This is because the non-ascii module names inside of a python file are abstract text but the representation on the filesystem is whatever the user's locale is. The consensus on python-dev when this was brought up seemed to be that using non-ascii in your local locale was important for learning to use python. But distributing non-ascii modules to other people was a bad idea. (If you have the attention span for long threads, http://mail.python.org/pipermail/python-dev/2011-January/107467.html Note that the threading was broken several times but the subject line stayed the same.) Description of the non-ascii module problem for people who want a summary: I have a python3 program that has:: #!/usr/bin/python3 -tt # -*- coding: utf-8 -*- import café café.do_something() python3 reads this file in and represents café as an abstract text type because I wrote it using utf-8 encoding and it can therefore decode the file's contents to its internal representation. However it then has to find the café module on disk. In my environment, I have LC_ALL=en_US.utf8. python3 finds the file café.py and uses that to satisfy the import. However, I have a colleague that does work with me. He has access to my program over a shared filesystem (or distributed to him via a git checkout or copied via an sdist, etc). His locale uses latin-1 (ISO8859-1) as his encoding (For instance, LC_ALL=en_US.ISO8859-1). When he runs my program, python3 is still able to read the application file itself (due to the piece of the file that specifies it's encoded in utf-8) but when it searches for a file to satisfy café on the disk it runs into probelsm because the café.py filename is not encoded using latin-1. Other scenarios where the files are being shared were discussed in the thread I mentioned but I won't go into all of them in this message... hopefully you can generalize this example to how it will cause problems on pypi, with pre-packaged modules on the system vs user's modules, etc. -Toshio
pgp6cAtGmzRww.pgp
Description: PGP signature
_______________________________________________ Distutils-SIG maillist - [email protected] http://mail.python.org/mailman/listinfo/distutils-sig
