On Jan 26, 2007, at 2:25 PM, Gábor Farkas wrote:

>
> Julian 'Julik' Tarkhanov wrote:
>>
>>
>> Python's unicode is actually UTF-16
>
> on linux it's usually utf-32, and on windows it's usually (always?)  
> utf-16.
sorry I forgot that - it's been a year at least since I last touched  
Python (actually it was
for the Django test drive)
>
> but you should not care about it. you see, in python,
> the unicode-strings are a separate data-type, and there's
> just no way to take a bytestring, and tell python: "from now on,
> you are an unicode-string, because i know that you are encoded in  
> utf-16."
segregating ustrings and strings is BBD, been' telling it for years.  
The latest I heard
is that the next major Py will abolish bytestrings for good.

Getting back to the issue that we were on, I am still strongly  
advocating the
"don't go there" approach for anything but Unicode. How it should be  
handled in relation to
source code is unknown to me (AFAIK Python has a pre-amble sort of  
declaration that you can actually use
to tell the interpreter which encoding your source is in). I just  
know you hit some major pain when you expect ustrings and
get bytestrings instead (and in Python, just as in Perl, only about  
30% of the libraries actually care about what they give you).

> so while it might be, that the conversion from utf-16-bytestrings to
> unicode is sometimes faster thatn converting from utf-8-bytestrings to
> unicode, you can't be sure, because as i wrote above, the internal
> unicode-encoding is not fixed.
>
>> whereas IO and the databases mostly
>> speak UTF-8 -
>> so no, you can't dump it over the wire.
>
>> We Rubyists are a tad happier
>> because we now
>> have all in UTF-8
>
> you mean that regexes, and all the methods of the string-class now are
> unicode-aware in ruby? :)

Regexes are unicode-aware for some time already except the case- 
sensitivity and the class repertoire (which will be fixed when  
Oniguruma is there). As for
the string methods, we mostly took care of them with AS::Multibyte  
(without silly subclassing) and that works wonders for me. The  
greatest advantage is that I never
have to check what's coming down the pipe because there's only one  
String to rule them all.
-- 
Julian 'Julik' Tarkhanov
please send all personal mail to
me at julik.nl



--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to