Malcolm Tredinnick wrote: > A couple of comments on the patch itself. I realise it's only a proof of > concept at the moment, so take as more things to think about when you > want to tidy it up: > > (1) A docstring like """needed to workaround the cgi.parse_sql > unicode-problem""" is not very future-proof. *What* parse_sql unicode > problem? How will we know if/when it goes away? Either a quick > description of the problem or a URL if it's tricky and explained > elsewhere will help people who need to read this code in six months > time.
ok > > (2) You can't necessarily assume the environment is always in ASCII (or > maybe you can; see below). For example, my current locale is set to > en_AU.UTF-8 and I can do > > export foo="€50,00" > > If I'm not careful when parsing os.environ['foo'] this comes out as > rubbish (I need to do unicode(os.environ['foo'], 'utf-8') or similar). > > Probably some playing around with the locale module to work out the > right behaviour and getting a few people to test things (e.g. Windows > vs. Linux vs. Macs, etc) will be necessary. It's also important not to > go too overboard here, but since arbitrary environment variables can be > set through Apache, we need to be able to work with that to be > "correct". Hmm ... what are the restrictions on what webservers can put > in their config files? Maybe ASCII-only is reasonable. *shrug* > phew... the immortal how-tolerant-we-should-be-when-doing-unicode-conversion problems :-) i generally prefer to do as little guesswork as possible, but in the case of the environ-variables it seems we cannot avoid it.. after all, it cannot crash when parsing the environ variables, because there's no way from the programmer's side to affect them. so what do you think about the following approach: try ascii-decoding if fails, try utf8-decoding if fails do iso-8859-1-decoding (this cannot fail). ? but imho this should happen only in "special" cases like environ-variables.. for example in get/post params i would prefer to raise an exception when the data cannot be en/de-coded using the configured charset. > Maybe more investigation needed here. > > (3) I know there are some software projects apparently using unicodize > as a word, but ... *shudder*. Using "code" as an analogy, "unicodify" > would be nicer (nobody uses "codize", I would hope). > ok > (4) As you go through this process, keep a list somewhere of what people > need to do to port existing applications across to using this > functionality. Ideally, the answer would be "not much" and we can cast > from the default encoding to unicode internally where necessary. But I'm > sure there will be some changes required, so keeping a list of things to > watch out for as you go will help people test this for you. > will try. gabor --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers -~----------~----~----~----~------~----~------~--~---