On Thu, Nov 26, 2009 at 7:03 AM, Hinnack <henrik.gens...@googlemail.com>wrote:

> Hi Karen,
> thanks again for your reply.
> I use Aptana with pydev extension.
> Debugging the app shows the following for search:
> dict: {u'caption': u'f\\xfcr', u'showold': False}
That's confusing to me, because other than having an extra \ (which could be
an artifact of how it's being displayed), that looks like a correctly-built
unicode object für.

and for qs:
> str: für
> although it seems to be &#65533; instead of ASCII 252 - but this could be,
> because I am sitting on a MAC
> while debugging.

Using python manage.py shell might shed more light, I fear the tool here is
assuming an incorrect bytestring encoding and getting in the way.

I cannot recreate anything like what you are seeing.  I have a model Thing
stored in a MySQL DB (using a utf-8 encoded table) with CharField name.
There are two instances of this Thing in the DB that contain für in the
name.  From a python manage.py shell, using Django 1.1.1:

>>> from ttt.models import Thing
>>> import django
>>> django.get_version()
>>> ufur = u'f\u00fcr'
>>> print ufur
>>> ufur
>>> ufur.encode('utf-8')
>>> ufur.encode('iso-8859-1')

small-u with umlaut is U+00FC, encoded in utf-8 that takes 2 bytes C3BC,
encoded in iso-8859-1 it is the 1 byte FC.

Filtering with icontains, using either the Unicode object or the utf-8
encode bytestring version, works properly:

>>> Thing.objects.filter(name__icontains=ufur)
[<Thing: für inserted as unicode>, <Thing: für inserted as utf8 bytestring>]
>>> Thing.objects.filter(name__icontains=ufur.encode('utf-8'))
[<Thing: für inserted as unicode>, <Thing: für inserted as utf8 bytestring>]

Attempting to filter with an iso-8859-1 encoded bytestring raises an error:

>>> Thing.objects.filter(name__icontains=ufur.encode('iso-8859-1'))
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/usr/lib/python2.5/site-packages/django/db/models/manager.py", line
129, in filter
    return self.get_query_set().filter(*args, **kwargs)
  File "/usr/lib/python2.5/site-packages/django/db/models/query.py", line
498, in filter
    return self._filter_or_exclude(False, *args, **kwargs)
  File "/usr/lib/python2.5/site-packages/django/db/models/query.py", line
516, in _filter_or_exclude
    clone.query.add_q(Q(*args, **kwargs))
  File "/usr/lib/python2.5/site-packages/django/db/models/sql/query.py",
line 1675, in add_q
  File "/usr/lib/python2.5/site-packages/django/db/models/sql/query.py",
line 1614, in add_filter
  File "/usr/lib/python2.5/site-packages/django/db/models/sql/where.py",
line 56, in add
    obj, params = obj.process(lookup_type, value)
  File "/usr/lib/python2.5/site-packages/django/db/models/sql/where.py",
line 269, in process
    params = self.field.get_db_prep_lookup(lookup_type, value)
"/usr/lib/python2.5/site-packages/django/db/models/fields/__init__.py", line
214, in get_db_prep_lookup
    return ["%%%s%%" % connection.ops.prep_for_like_query(value)]
  File "/usr/lib/python2.5/site-packages/django/db/backends/__init__.py",
line 364, in prep_for_like_query
    return smart_unicode(x).replace("\\", "\\\\").replace("%",
"\%").replace("_", "\_")
  File "/usr/lib/python2.5/site-packages/django/utils/encoding.py", line 44,
in smart_unicode
    return force_unicode(s, encoding, strings_only, errors)
  File "/usr/lib/python2.5/site-packages/django/utils/encoding.py", line 92,
in force_unicode
    raise DjangoUnicodeDecodeError(s, *e.args)
DjangoUnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-2:
unexpected end of data. You passed in 'f\xfcr' (<type 'str'>)

This is because Django assumes the bytestring is utf-8 encoded, and runs
into trouble attempting to convert to unicode specifying utf-8 as the
string's encoding, since it is not valid utf-8 data.

The only way I have been able to recreate anything like what you are
describing is to incorrectly construct the original unicode object from a
utf-8 bytestring assuming a iso-8859-1 encoding:

>>> badufur = ufur.encode('utf-8').decode('iso-8859-1')
>>> badufur
>>> print badufur
>>> print badufur.encode('utf-8')
>>> print badufur.encode('iso-8859-1')

Using that unicode object doesn't produce any hits in the DB:

>>> Thing.objects.filter(name__icontains=badufur)

But encoding it to iso-8859-1 does, because that has the effect of restoring
the original utf-8 bytestring:

>>> Thing.objects.filter(name__icontains=badufur.encode('iso-8859-1'))
[<Thing: für inserted as unicode>, <Thing: für inserted as utf8 bytestring>]

However, the debug info you show above doesn't show an incorrectly-built
unicode object, so I'm very confused by it.



You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-us...@googlegroups.com.
To unsubscribe from this group, send email to 
For more options, visit this group at 

Reply via email to