Jacob Kaplan-Moss wrote: > On Jun 28, 2006, at 6:07 AM, Gábor Farkas wrote: >> what i think we are missing the most is to hear about the "main" >> developers (project owners?) (adrian, malcolm, jacob etc.) opinion >> about >> unicode-ification. if they think we should switch django completely to >> unicode, then fine. but if they think that django should still support >> bytestrings, i really don't see how we could do the unicode-ification >> without breaking backwards compatibility. > > In a nutshell: I think it's too much work, with too many backwards- > incompatible changes, with too little payoff. > > Let me expand a bit on each of those points: > > "Too much work..." -- there's quite a bit that would need to be > changed, and a number of sticky problems to be solved. Just one > example is the issue of template encodings -- do we need to start > indicating that a certain template is UTF-8 or whatever?
And then there's letting the database know wtf is going into it. And rich text editors, and third party libs, and... Unicode is just hard work. Why just the other day :) I had to fix up an FCKEditor installation - every time you entered a question mark it got converted to an omega or a euro symbol after being saved. Merit points to anyone who can figure out what happened there... * > "... with too many backwards-incompatible changes ..." -- as Hugo > points out, this will break a lot of existing code. My experience is > that Unicode issues are the worst types of bugs since they only crop > up when dealing with particular data. My experience is similar, but also that Unicode/Encoding issues crop up where you have libraries that have different approaches (or assumptions) about either the encoding or whether the thing being passed in is a str or unicode object. Managing inter-module clashes is harder than scrubbing incoming data - I have it down to the cost of doing business with Python at this point. > "... with too little payoff." -- right now it's completely possible > to nicely handle Unicode data in Django as long as you're careful. > Yes, it's not as easy as it might be, but the net result of a Unicode- > ification would be an incremental improvement at best. > > So I think -- for now -- there are more important places to spend our > energy. Actually, now's a good time to do it. So long as Django is a closed world, it's a manageable problem. I suspect being full stack is one reason what this is not biting people hard atm. Once people start building module and plugins on top, it'll be damn hard to do the right thing later on. However there will be a lot of people with incentive to help out at that point :) cheers Bill * The FCKEditor file should have been stored as UTF16 (ff ee) to handle things like the euro symbol, but it had been down-converted to 8bit at some point - all the symbols were remapped to '?' so questions were replaced with whatever html entity that got pulled out of the lookup. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers -~----------~----~----~----~------~----~------~--~---
