Re: unicode.. reject?

Ivan Sagalaev Tue, 30 May 2006 23:55:41 -0700

gabor wrote:

>>Using this convention 
>>one can write international apps without worries since all messy things 
>>are made inside the framework.
>>    
>>
>
>well, i would replace 'without worries' with 'good enough' :)
>  
>
Ok :-)


>for example, i was building a very simple web-file-manager. you can ask 
>python to get you the os.listdir, etc, data in unicode, which i think 
>the only sensible way, because otherwise you have to watch out for the 
>filesystem-encoding. but because the querystring and httpresponse is 
>bytestring, i have .encode('utf8'), .decode('utf8') all the way.
>  
>
Not at all. You can talk to file system in utf-8 and do no conversions 
at all. I'm doing it in my app and it works just fine.

I don't know exactly how Python handles encoding of data that it 
receives from the OS and I think there can be problems with 
misconfigured OS giving out those characters in a non-UTF8 byte 
encoding. But unicode here won't help either since Python would try to 
convert some arbitrary stream to unicode and will throw an exception. 
With  byte string I suppose you will get the string without errors but 
with the garbage in it. All this anyway just requires configuring 
locales properly.

And as a sidenote: HttpResponse can happily accept unicode with 
convertion to DEFAULT_CHARSET on actual output.

>hmm... is it that hard to get some japanese text and maybe some german 
>names? :)
>  
>
I meant that it's hard when you should install some weird fonts and look 
at symbols you don't understand :-). But you are right. I think every 
developer in ASCII world can work with, say, accented latin characters 
and test if they are uppercased or counted properly.

On the other hand people working closely with non-ASCII characters are 
working out some useful debugging habits that one just can't know only 
by theory. For example I can very quickly distinguish string that is in 
utf-8 shown in windows-1251 locale from string in windows-1251 shown in 
utf-8 locale by just looking at it. For a person not working with 
cyrillic they all will look equaly weird I think... Which is kinda 
supports the idea of all-unicode since Python represents byte-strings 
differently than unciode ones.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers
-~----------~----~----~----~------~----~------~--~---

Re: unicode.. reject?

Reply via email to