#12849: django's development server raises an encoding exception when trying to
colorize non-ascii text
------------------------------------------------+---------------------------
          Reporter:  jype                       |         Owner:  nobody
            Status:  closed                     |     Milestone:  1.2   
         Component:  django-admin.py runserver  |       Version:  SVN   
        Resolution:  fixed                      |      Keywords:        
             Stage:  Accepted                   |     Has_patch:  1     
        Needs_docs:  0                          |   Needs_tests:  0     
Needs_better_patch:  0                          |  
------------------------------------------------+---------------------------
Comment (by russellm):

 Replying to [comment:13 kmtracey]:
 > I'm not sure the committed fix is the best alternative, though. When
 printing out management command errors I think it would be better to use
 sys.stderr.encoding, if it exists, since that is where we are sending the
 output. Windows for example won't be using utf-8 as the terminal encoding
 (by default), so the fix as committed will result in unreadable output for
 non-ASCII exception data on Windows. Better than an exception I suppose
 but I think it would be better to attempt to use the right encoding, and
 also specify replace rather than strict for error handling so that if the
 data can't be encoded in the target charset then we still don't raise an
 exception.

 There are two issues here.

 Firstly, whatever encoding we choose, there are going to be problems. On
 platforms where stderr is ASCII (or equivalent), there is no reliable way
 to print non-ASCII characters. So, we need to choose  ignore/replace (to
 fail silently) or strict (which will raise the same errors being reported
 by this ticket). So on ASCII terminals, we're always going to have
 problems -- it's just a matter of how we hide (or raise) the errors. That
 said: For the record, my ANSI_X3.4-1968 test box actually manages to print
 the special unicode characters correctly as long as the bytestring is
 encoded utf-8. This is why I checked in the patch I did.

 Secondly, if we do anything in this area, we're going to need to audit
 pretty much all the current management commands. At the moment, there is a
 certain amount of confusion regarding how and when text is encoded for
 display; for example, sqlall builds everything in unicode, and encodes to
 UTF-8 before returning the value for the base command framework to print
 the value. We would be well served to do a full teardown here and ensure
 we keep unicode right up to the last moment -- but that's a much bigger
 patch.

 [12849] fixed every observable case that I could generate, on UTF-8 and
 ASCII terminals. I have no doubt that there are cp1252 or KOI8-R terminals
 that will still have problems, but we'll need some new test cases (and
 platforms on which to test them).

 > Which brings me to: I'd still like to understand when this problem crops
 up, for the colorize case. Unicode query string parms are percent-encoded
 in output -- under what circumstances is the server being asked to
 colorize unicode data containing non-ASCII characters?

 The test case I've been using is to have a database model with a field
 that has db_column=u'hello\u00c2\u00c3'. This is output as the column name
 under sqlall, which broke when it was passed into str() at the start of
 colorize.

-- 
Ticket URL: <http://code.djangoproject.com/ticket/12849#comment:14>
Django <http://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/django-updates?hl=en.

Reply via email to