#13758: MySQLdb utf8_bin and django causes UnicodeDecodeError
----------------------------------------------------------+-----------------
 Reporter:  sam.vev...@gmail.com                          |       Owner:  
nobody    
   Status:  new                                           |   Milestone:        
    
Component:  Database layer (models, ORM)                  |     Version:  1.2   
    
 Keywords:  utf8_binMySQLdb collation unicode bytestring  |       Stage:  
Unreviewed
Has_patch:  0                                             |  
----------------------------------------------------------+-----------------
 Issue:
 I have a Model with a FileField. When I delete that instances of that
 model that have unicode characters in their filenames, I get a

 {{{
 'ascii' codec can't decode byte 0xc3 in position 18: ordinal not in
 range(128)
 }}}

 I finally traced the problem back to my database collation: utf8_bin. I
 chose utf8_bin so I could order the strings in a case-sensitive manner.
 FYI, MySQLdb does not return python unicode strings with a utf8_bin
 collation, it returns utf8 bytestrings. for a brief description of that
 issue see:
 http://code.djangoproject.com/ticket/8340#comment:4

 The traceback from my exception reveals the exception being thrown in
 "django/db/models/fields/files.py" in get_prep_value (line 248).
 FileField is a subclass of Field, but implements the same backend
 MySQL type (varchar) as a CharField. However it seems that FileField
 and CharField have completely different implementations of
 get_prep_db.

 Here is CharField's implementation:
     def to_python(self, value):
         if isinstance(value, basestring) or value is None:
             return value
         return smart_unicode(value)

     def get_prep_value(self, value):
         return self.to_python(value)

 Here is Filefield's:
     def get_prep_value(self, value):
         "Returns field's value prepared for saving into a database."
         # Need to convert File objects provided via a form to unicode for
 database insertion
         if value is None:
             return None
         return unicode(value)

 My experimentations revealed that if I replace the FileField
 implementation of get_prep_value with CharField's implementation, the
 exception
 goes away. The issue is that the default encoding is ascii and so
 unicode() called on a utf8 byte str blows up. The CharField
 implementation simply checks if the value is an instance of basestring
 and quietly passes it through.

-- 
Ticket URL: <http://code.djangoproject.com/ticket/13758>
Django <http://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To post to this group, send email to django-upda...@googlegroups.com.
To unsubscribe from this group, send email to 
django-updates+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-updates?hl=en.

Reply via email to