On Aug 2, 9:39 pm, "Jacob Kaplan-Moss" <[EMAIL PROTECTED]> wrote: > Yuck, clients that don't speak HTTP correctly make me angry. > > Reading the RFC, though, I see that since HTTP 1.0 made "charset" > optional, it remains so in HTTP 1.1, and we're supposed to "guess" and > use ISO-8859-1 like you're doing in your code snippet. I suppose that > means that Django's request object should do pretty much what you've > done in this snippet.
This is a totally ridiculous flaw with the HTTP spec - you literally have no reliable way of telling what encoding a request coming in to your site uses, since you can't be absolutely sure that the user-agent read a page from your site to find out your character encoding! One really smart trick you can do is this: attempt to decode as UTF-8 (which is nice and strict and will fail noisily for pretty much anything that isn't either UTF-8 or ASCII, a UTF-8 subset). If decoding fails, assume ISO-8859-1 which will decode absolutely anything without ever throwing an error (although if the content isn't ISO-8859-1 you'll end up with garbage). I tend to call this the Flickr trick, because of the lovely big letters here: http://www.flickr.com/services/api/misc.encoding.html If it really matters, you can use Mark Pilgrim's chardet library to detect the most likely encoding based on statistical analysis: http://chardet.feedparser.org/ --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~----------~----~----~----~------~----~------~--~---