Unicode = confusion + UnicodeDecodeError

Malcolm Tredinnick Wed, 14 Jan 2009 18:28:54 -0800

On Tue, 2009-01-13 at 19:30 +0000, Rachel Willmer wrote:
> I've just upgraded some code to Django 1.0 and the caching stopped working.
> 
> I have found a fix but I don't understand what's happening so if
> anyone can explain, I'd be grateful.
> 
> My code used to look somthing like this (simplified for clearer reading)
> 
> cached=cache.get(key)
> if cached:
>     list=pickle.loads(cached)
> else:
>     list = Banks.objects()
>     pickled = pickle.dumps(list,pickle.HIGHEST_PROTOCOL)
>     cache.set(key,pickled)
> 
> pickled is of type 'str', my default encoding appears to be 'ascii'.
> 
> When I call cache.set, I get a UnicodeDecodeError.


Which cache backend are you using? UnicodeDecodeError occurs when
converting from bytestrings to unicode and, as far as I can see, none of
the set() methods for caches should be doing this.

[...]
> So what I don't understand is, why 'iso_8859_1'? As far as I know, the
> python default encoding is 'ascii' and the django DEFAULT_CHARSET is
> 'utf-8'. So where's this coming from?

It's difficult to be certain about this without knowing where the
contents of "cached" came from originally, but I'll guess it's loaded
from the database.

Right now, Django doesn't have any concept of "binary data" for database
storage. So it treats everything as strings and converts them to unicode
objects upon loading. That the conversion happens is well understood
these days (I hope). Your call to smart_str() essentially says "convert
this from a unicode object back to a str object" and by using
iso-8859-1, you're indicating that the full range of bytes might well
occur (ascii data is only going to occupy the lower 7 bits, whereas the
binary pickle protocol uses all 8 bits in the byte.

In a fashion, you can think of iso-8859-1 as the identity transformation
for bytestrings. In Python, it actually just maps every byte to itself
(even those that aren't valid iso-8859-1 characters). That's for
historical reasons, but it's also guaranteed behaviour and is quite
useful as a way to change the type from unicode to a bytestring.

That explains why iso-8859-1 can be used to change types (unicode ->
bytestring and back), but it doesn't explain why the original error is
occurring in the first place. I may have missed something when browsing
the code there, though.

Malcolm


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Django / memcached / pickle / Unicode = confusion + UnicodeDecodeError

Reply via email to