On Fri, 19 Dec 2008 15:20:08 -0700, Joe Strout wrote: > Marc 'BlackJack' Rintsch wrote: > >>> And because strings in Python, unlike in (say) REALbasic, do not know >>> their encoding -- they're just a string of bytes. If they were a >>> string of bytes PLUS an encoding, then every string would know what it >>> is, and things like conversion to another encoding, or concatenation >>> of two strings that may differ in encoding, could be handled >>> automatically. >>> >>> I consider this one of the great shortcomings of Python, but it's >>> mostly just a temporary inconvenience -- the world is moving to >>> Unicode, and with Python 3, we won't have to worry about it so much. >> >> I don't see the shortcoming in Python <3.0. If you want real strings >> with characters instead of just a bunch of bytes simply use `unicode` >> objects instead of `str`. > > Fair enough -- that certainly is the best policy. But working with any > other encoding (sometimes necessary when interfacing with any other > software), it's still a bit of a PITA.
But it has to be. There is no automagic guessing possible. >> And does REALbasic really use byte strings plus an encoding!? > > You betcha! Works like a dream. IMHO a strange design decision. A lot more hassle compared to an opaque unicode string type which uses some internal encoding that makes operations like getting a character at a given index easy or concatenating without the need to reencode. Ciao, Marc 'BlackJack' Rintsch -- http://mail.python.org/mailman/listinfo/python-list