I'm in full agreement with Marc-Andre below, except I don't like (1) at all -- having used other APIs that always return Unicode (like the Python XML parsers) it bothers me to get Unicode for no reason at all. OTOH I think Python 3.0 should be using a Unicode model closer to Java's.
On 7/11/05, M.-A. Lemburg <[EMAIL PROTECTED]> wrote: > Neil Hodgson wrote: > > On unicode versions of Windows, for attributes like os.listdir, > > os.getcwd, sys.argv, and os.environ, which can usefully return unicode > > strings, there are 4 options I see: > > > > 1) Always return unicode. This is the option I'd be happiest to use, > > myself, but expect this choice would change the behaviour of existing > > code too much and so produce much unhappiness. > > Would be nice, but will likely break too much code - if you > let Unicode object enter non-Unicode aware code, it is likely > that you'll end up getting stuck in tons of UnicodeErrors. If you > want to get a feeling for this, try running Python with -U command > line switch. > > > 2) Return unicode when the text can not be represented in ASCII. This > > will cause a change of behaviour for existing code which deals with > > non-ASCII data. > > +1 on this one (s/ASCII/Python's default encoding). > > > 3) Return unicode when the text can not be represented in the default > > code page. While this change can lead to breakage because of combining > > byte string and unicode strings, it is reasonably safe from the point > > of view of data integrity as current code is returning garbage strings > > that look like '?????'. > > -1: code pages are evil and the reason why Unicode was invented > in the first place. This would be a step back in history. > > > 4) Provide two versions of the attribute, one with the current name > > returning byte strings and a second with a "u" suffix returning > > unicode. This is the least intrusive, requiring explicit changes to > > code to receive unicode data. For patch #1231336 I chose this approach > > producing sys.argvu and os.environu. > > -1 - this is what Microsoft did for many of their APIs. The > result is two parallel universes with two sets of features, > bugs, documentation, etc. > > > For os.listdir the current behaviour of returning unicode when its > > argument is unicode can be retained but that is not extensible to, for > > example, sys.argv. > > I don't think that using the parameter type as "parameter" > to function is a good idea. However, accepting both strings > and Unicode will make it easier to maintain backwards > compatibility. > > > Since this issue may affect many attributes a common approach > > should be chosen. > > Indeed. > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source (#1, Jul 11 2005) > >>> Python/Zope Consulting and Support ... http://www.egenix.com/ > >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ > >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > ________________________________________________________________________ > > ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! :::: > -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com