#13758: MySQLdb utf8_bin and django causes UnicodeDecodeError
---------------------------------------------------+------------------------
          Reporter:  sam.vev...@gmail.com          |         Owner:  nobody     
                                 
            Status:  new                           |     Milestone:             
                                 
         Component:  Database layer (models, ORM)  |       Version:  1.2        
                                 
        Resolution:                                |      Keywords:  
utf8_binMySQLdb collation unicode bytestring
             Stage:  Accepted                      |     Has_patch:  1          
                                 
        Needs_docs:  0                             |   Needs_tests:  1          
                                 
Needs_better_patch:  0                             |  
---------------------------------------------------+------------------------
Changes (by russellm):

  * has_patch:  0 => 1
  * needs_tests:  0 => 1
  * stage:  Unreviewed => Accepted

Old description:

> Issue:
> I have a Model with a FileField. When I delete that instances of that
> model that have unicode characters in their filenames, I get a
>
> {{{
> 'ascii' codec can't decode byte 0xc3 in position 18: ordinal not in
> range(128)
> }}}
>
> I finally traced the problem back to my database collation: utf8_bin. I
> chose utf8_bin so I could order the strings in a case-sensitive manner.
> FYI, MySQLdb does not return python unicode strings with a utf8_bin
> collation, it returns utf8 bytestrings. for a brief description of that
> issue see:
> http://code.djangoproject.com/ticket/8340#comment:4
>
> The traceback from my exception reveals the exception being thrown in
> "django/db/models/fields/files.py" in get_prep_value (line 248).
> FileField is a subclass of Field, but implements the same backend
> MySQL type (varchar) as a CharField. However it seems that FileField
> and CharField have completely different implementations of
> get_prep_db.
>
> Here is CharField's implementation:
>     def to_python(self, value):
>         if isinstance(value, basestring) or value is None:
>             return value
>         return smart_unicode(value)
>
>     def get_prep_value(self, value):
>         return self.to_python(value)
>
> Here is Filefield's:
>     def get_prep_value(self, value):
>         "Returns field's value prepared for saving into a database."
>         # Need to convert File objects provided via a form to unicode for
> database insertion
>         if value is None:
>             return None
>         return unicode(value)
>
> My experimentations revealed that if I replace the FileField
> implementation of get_prep_value with CharField's implementation, the
> exception
> goes away. The issue is that the default encoding is ascii and so
> unicode() called on a utf8 byte str blows up. The CharField
> implementation simply checks if the value is an instance of basestring
> and quietly passes it through.

New description:

 Issue:
 I have a Model with a FileField. When I delete that instances of that
 model that have unicode characters in their filenames, I get a

 {{{
 'ascii' codec can't decode byte 0xc3 in position 18: ordinal not in
 range(128)
 }}}

 I finally traced the problem back to my database collation: utf8_bin. I
 chose utf8_bin so I could order the strings in a case-sensitive manner.
 FYI, MySQLdb does not return python unicode strings with a utf8_bin
 collation, it returns utf8 bytestrings. for a brief description of that
 issue see:
 http://code.djangoproject.com/ticket/8340#comment:4

 The traceback from my exception reveals the exception being thrown in
 "django/db/models/fields/files.py" in get_prep_value (line 248).
 FileField is a subclass of Field, but implements the same backend
 MySQL type (varchar) as a CharField. However it seems that FileField
 and CharField have completely different implementations of
 get_prep_db.

 Here is CharField's implementation:
 {{{
     def to_python(self, value):
         if isinstance(value, basestring) or value is None:
             return value
         return smart_unicode(value)

     def get_prep_value(self, value):
         return self.to_python(value)
 }}}

 Here is Filefield's:
 {{{
     def get_prep_value(self, value):
         "Returns field's value prepared for saving into a database."
         # Need to convert File objects provided via a form to unicode for
 database insertion
         if value is None:
             return None
         return unicode(value)
 }}}

 My experimentations revealed that if I replace the FileField
 implementation of get_prep_value with CharField's implementation, the
 exception
 goes away. The issue is that the default encoding is ascii and so
 unicode() called on a utf8 byte str blows up. The CharField
 implementation simply checks if the value is an instance of basestring
 and quietly passes it through.

-- 
Ticket URL: <http://code.djangoproject.com/ticket/13758#comment:2>
Django <http://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To post to this group, send email to django-upda...@googlegroups.com.
To unsubscribe from this group, send email to 
django-updates+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-updates?hl=en.

Reply via email to