On Jan 28, 4:03 am, "ak" <[EMAIL PROTECTED]> wrote: > After some thoughts I came to the following conclusion: if you guys > want to keep support of legacy charsets in fact you don't have to > force model objects too be unicoded. Firstly, they are passed to > templates and filters and we can't mix legacy charsets with unicode in > one template. Next, if I don't use unicode, I don't have to code my > python sources (views) in unicode. So, I need to be able to pass > string values into my model objects and my strings are not unicoded. > > So if everyone agreed, the way is simple: > 1. when django loads data from db and fills in a model object, all > strings have to be encoded according to DEFAULT_CHARSET > 2. when django passes data from form object to model object, it has to > encode strings according to DEFAULT_CHARSET again
This is quite confusing. It seems you're advocating decoding/encoding multiple times. Being a Norwegian involved in web development in China, I love Unicode, and I've been fighting with it for 6-7 years. This is what I've learned: 1) Unicode != external character encoding. All programming languages have an internal unicode representation, and all code that needs to understand the concept of a "character" deals with this; e.g., lowercasing, sorting. You never worry what this representation is (you're assuming too much about the programming language if you do). Instead you: decode from a character encoding (e.g., UTF-8, ISO8859-1, GB18030) into this representation encode this internal representation into an character encoding UTF-8, UTF-16 are character encodings. GB18030 is a Chinese character encoding that is just as capable of representing all the code points in the Unicode standard, same as UTF-8 and UTF-16. Older encodings are usually language/locale specific, so they can only represent a small subset of the code points (characters) in Unicode. I'm not sure what "unicoding", "unicodifying" means. Is it decoding into the internal unicode representation, or the process of making your code unicode aware and compatible? Joel has a nicely written intro: http://www.joelonsoftware.com/ articles/Unicode.html 2) Unicode is an all-or-nothing thing (not obvious). If you try to use it partly, sometimes, or only somewhere, you'll end up with UnicodeErrors popping up everywhere and a very inefficient architecture with multiple encoding/decodings happenings during each request... Oh this module doesn't do Unicode, better give it UTF-8, but then it has to pass something back, which should be of type unicode, but it doesn't know which character encoding we're using so then I have to pass that to it, ... ad nauseam. 3) Doing Unicode is (I think) worthwhile, but it is a tradeoff: everyone suddenly have to understand and deal with character encoding issues, and there's a slight performance penalty. It's practically impossible to have Unicode without making these tradeoffs. (That said, many environment have made these tradeoffs successfully, e.g., Java, C#.) Only doing decoding/encoding at the I/O edges reduces the pain, however. Rgds, Bjorn --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~----------~----~----~----~------~----~------~--~---
