Re: Unicodification of Django

Bill de hÓra Thu, 29 Jun 2006 09:37:21 -0700

Jacob Kaplan-Moss wrote:
> On Jun 28, 2006, at 6:07 AM, Gábor Farkas wrote:
>> what i think we are missing the most is to hear about the "main"
>> developers (project owners?) (adrian, malcolm, jacob etc.) opinion  
>> about
>> unicode-ification. if they think we should switch django completely to
>> unicode, then fine. but if they think that django should still support
>> bytestrings, i really don't see how we could do the unicode-ification
>> without breaking backwards compatibility.
> 
> In a nutshell: I think it's too much work, with too many backwards- 
> incompatible changes, with too little payoff.
> 
> Let me expand a bit on each of those points:
> 
> "Too much work..." -- there's quite a bit that would need to be  
> changed, and a number of sticky problems to be solved.  Just one  
> example is the issue of template encodings -- do we need to start  
> indicating that a certain template is UTF-8 or whatever?


And then there's letting the database know wtf is going into it. And 
rich text editors, and third party libs, and... Unicode is just hard work.

Why just the other day :) I had to fix up an FCKEditor installation - 
every time you entered a question mark it got converted to an omega or a 
euro symbol after being saved. Merit points to anyone who can figure out 
what happened there... *


> "... with too many backwards-incompatible changes ..." -- as Hugo  
> points out, this will break a lot of existing code.  My experience is  
> that Unicode issues are the worst types of bugs since they only crop  
> up when dealing with particular data.

My experience is similar, but also that Unicode/Encoding issues crop up 
where you have libraries that have different approaches (or assumptions) 
about either the encoding or whether the thing being passed in is a str 
or unicode object.  Managing inter-module clashes is harder than 
scrubbing incoming data - I have it down to the cost of doing business 
with Python at this point.


> "... with too little payoff." -- right now it's completely possible  
> to nicely handle Unicode data in Django as long as you're careful.   
> Yes, it's not as easy as it might be, but the net result of a Unicode- 
> ification would be an incremental improvement at best.
> 
> So I think -- for now -- there are more important places to spend our  
> energy.

Actually, now's a good time to do it. So long as Django is a closed 
world, it's a manageable problem. I suspect being full stack is one 
reason what this is not biting people hard atm. Once people start 
building module and plugins on top, it'll be damn hard to do the right 
thing later on. However there will be a lot of people with incentive to 
help out at that point :)

cheers
Bill

* The FCKEditor file should have been stored as UTF16 (ff ee) to handle 
things like the euro symbol, but it had been down-converted to 8bit at 
some point - all the symbols were remapped to '?' so questions were 
replaced with whatever html entity that got pulled out of the lookup.


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers
-~----------~----~----~----~------~----~------~--~---

Re: Unicodification of Django

Reply via email to