#13758: MySQLdb utf8_bin and django causes UnicodeDecodeError ---------------------------------------------------+------------------------ Reporter: sam.vev...@gmail.com | Owner: nobody Status: new | Milestone: Component: Database layer (models, ORM) | Version: 1.2 Resolution: | Keywords: utf8_binMySQLdb collation unicode bytestring Stage: Accepted | Has_patch: 1 Needs_docs: 0 | Needs_tests: 1 Needs_better_patch: 0 | ---------------------------------------------------+------------------------ Changes (by russellm):
* has_patch: 0 => 1 * needs_tests: 0 => 1 * stage: Unreviewed => Accepted Old description: > Issue: > I have a Model with a FileField. When I delete that instances of that > model that have unicode characters in their filenames, I get a > > {{{ > 'ascii' codec can't decode byte 0xc3 in position 18: ordinal not in > range(128) > }}} > > I finally traced the problem back to my database collation: utf8_bin. I > chose utf8_bin so I could order the strings in a case-sensitive manner. > FYI, MySQLdb does not return python unicode strings with a utf8_bin > collation, it returns utf8 bytestrings. for a brief description of that > issue see: > http://code.djangoproject.com/ticket/8340#comment:4 > > The traceback from my exception reveals the exception being thrown in > "django/db/models/fields/files.py" in get_prep_value (line 248). > FileField is a subclass of Field, but implements the same backend > MySQL type (varchar) as a CharField. However it seems that FileField > and CharField have completely different implementations of > get_prep_db. > > Here is CharField's implementation: > def to_python(self, value): > if isinstance(value, basestring) or value is None: > return value > return smart_unicode(value) > > def get_prep_value(self, value): > return self.to_python(value) > > Here is Filefield's: > def get_prep_value(self, value): > "Returns field's value prepared for saving into a database." > # Need to convert File objects provided via a form to unicode for > database insertion > if value is None: > return None > return unicode(value) > > My experimentations revealed that if I replace the FileField > implementation of get_prep_value with CharField's implementation, the > exception > goes away. The issue is that the default encoding is ascii and so > unicode() called on a utf8 byte str blows up. The CharField > implementation simply checks if the value is an instance of basestring > and quietly passes it through. New description: Issue: I have a Model with a FileField. When I delete that instances of that model that have unicode characters in their filenames, I get a {{{ 'ascii' codec can't decode byte 0xc3 in position 18: ordinal not in range(128) }}} I finally traced the problem back to my database collation: utf8_bin. I chose utf8_bin so I could order the strings in a case-sensitive manner. FYI, MySQLdb does not return python unicode strings with a utf8_bin collation, it returns utf8 bytestrings. for a brief description of that issue see: http://code.djangoproject.com/ticket/8340#comment:4 The traceback from my exception reveals the exception being thrown in "django/db/models/fields/files.py" in get_prep_value (line 248). FileField is a subclass of Field, but implements the same backend MySQL type (varchar) as a CharField. However it seems that FileField and CharField have completely different implementations of get_prep_db. Here is CharField's implementation: {{{ def to_python(self, value): if isinstance(value, basestring) or value is None: return value return smart_unicode(value) def get_prep_value(self, value): return self.to_python(value) }}} Here is Filefield's: {{{ def get_prep_value(self, value): "Returns field's value prepared for saving into a database." # Need to convert File objects provided via a form to unicode for database insertion if value is None: return None return unicode(value) }}} My experimentations revealed that if I replace the FileField implementation of get_prep_value with CharField's implementation, the exception goes away. The issue is that the default encoding is ascii and so unicode() called on a utf8 byte str blows up. The CharField implementation simply checks if the value is an instance of basestring and quietly passes it through. -- Ticket URL: <http://code.djangoproject.com/ticket/13758#comment:2> Django <http://code.djangoproject.com/> The Web framework for perfectionists with deadlines. -- You received this message because you are subscribed to the Google Groups "Django updates" group. To post to this group, send email to django-upda...@googlegroups.com. To unsubscribe from this group, send email to django-updates+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-updates?hl=en.