On Jan 26, 2007, at 2:25 PM, Gábor Farkas wrote:
> > Julian 'Julik' Tarkhanov wrote: >> >> >> Python's unicode is actually UTF-16 > > on linux it's usually utf-32, and on windows it's usually (always?) > utf-16. sorry I forgot that - it's been a year at least since I last touched Python (actually it was for the Django test drive) > > but you should not care about it. you see, in python, > the unicode-strings are a separate data-type, and there's > just no way to take a bytestring, and tell python: "from now on, > you are an unicode-string, because i know that you are encoded in > utf-16." segregating ustrings and strings is BBD, been' telling it for years. The latest I heard is that the next major Py will abolish bytestrings for good. Getting back to the issue that we were on, I am still strongly advocating the "don't go there" approach for anything but Unicode. How it should be handled in relation to source code is unknown to me (AFAIK Python has a pre-amble sort of declaration that you can actually use to tell the interpreter which encoding your source is in). I just know you hit some major pain when you expect ustrings and get bytestrings instead (and in Python, just as in Perl, only about 30% of the libraries actually care about what they give you). > so while it might be, that the conversion from utf-16-bytestrings to > unicode is sometimes faster thatn converting from utf-8-bytestrings to > unicode, you can't be sure, because as i wrote above, the internal > unicode-encoding is not fixed. > >> whereas IO and the databases mostly >> speak UTF-8 - >> so no, you can't dump it over the wire. > >> We Rubyists are a tad happier >> because we now >> have all in UTF-8 > > you mean that regexes, and all the methods of the string-class now are > unicode-aware in ruby? :) Regexes are unicode-aware for some time already except the case- sensitivity and the class repertoire (which will be fixed when Oniguruma is there). As for the string methods, we mostly took care of them with AS::Multibyte (without silly subclassing) and that works wonders for me. The greatest advantage is that I never have to check what's coming down the pipe because there's only one String to rule them all. -- Julian 'Julik' Tarkhanov please send all personal mail to me at julik.nl --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~----------~----~----~----~------~----~------~--~---