On Aug 2, 9:39 pm, "Jacob Kaplan-Moss" <[EMAIL PROTECTED]>
wrote:
> Yuck, clients that don't speak HTTP correctly make me angry.
>
> Reading the RFC, though, I see that since HTTP 1.0 made "charset"
> optional, it remains so in HTTP 1.1, and we're supposed to "guess" and
> use ISO-8859-1 like you're doing in your code snippet. I suppose that
> means that Django's request object should do pretty much what you've
> done in this snippet.

This is a totally ridiculous flaw with the HTTP spec - you literally
have no reliable way of telling what encoding a request coming in to
your site uses, since you can't be absolutely sure that the user-agent
read a page from your site to find out your character encoding!

One really smart trick you can do is this: attempt to decode as UTF-8
(which is nice and strict and will fail noisily for pretty much
anything that isn't either UTF-8 or ASCII, a UTF-8 subset). If
decoding fails, assume ISO-8859-1 which will decode absolutely
anything without ever throwing an error (although if the content isn't
ISO-8859-1 you'll end up with garbage). I tend to call this the Flickr
trick, because of the lovely big letters here:
http://www.flickr.com/services/api/misc.encoding.html

If it really matters, you can use Mark Pilgrim's chardet library to
detect the most likely encoding based on statistical analysis:
http://chardet.feedparser.org/


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to