On Fri, 2008-09-12 at 04:44 -0700, Julien Phalip wrote:
> Hi,
> 
> I'm running a fairly large website (10,000 news items). Initially it
> was made in ASP with MSSQL, then I took the project over and ported it
> to PHP and MYSQL. Finally, 6 months ago I ported it to Django and
> MySQL.
> 
> Now, ever since the site has been running on Django, I've received
> about a dozen error emails every couple of days (once I even received
> 400 overnight!). Those errors are systematically caused by web
> crawlers (yahoo slurp, googlebot, msn, yeti, etc.). It systematically
> chokes on the same line of code, which is loading some data from file
> caching. The traceback is pretty much always as follows:
> 
>  File "/MYPATH/apps/news/templatetags/news_tags.py", line 13, in
> show_sidebar
>    cached_sidebar = cache.get('the_sidebar')
> 
>  File "/MYPATH/django/core/cache/backends/filebased.py", line 50, in
> get
>    return pickle.load(f)
> 
> UnicodeDecodeError: 'utf8' codec can't decode byte 0x8a in position
> 5999: unexpected code byte
> 
> The actual utf-8 character varies each time.

You mean the byte, not the UTF-8 character, since the whole point is
that it isn't a UTF-8 encoding of anything.

> 
> I spent a lot of time cleaning up, reorganising and improving the
> code... in vain. Now I strongly suspect it might be because of the
> data being corrupted in some way.

T
> 
> Now, what puzzles me is that all the URLs which fail with web
> crawlers, actually work perfectly well when I simply open them in a
> browser.
> 
> The code has always followed a recent trunk of Django and now runs on
> 1.0. I have already raised that issue in this mailing list a couple of
> times in the past, but I didn't get much help. I haven't opened a
> ticket because I cannot reproduce the error myself (it only happens
> with web crawlers) and because I suspect it might be because of my
> setup (no other site that I have and use file caching have this
> problem).

So if I were you I'd put some extra debugging into Django itself to try
and gather more information. In particular, since it's loading a
particular file at the time of the problem, you could log the file name
and probably copy the contents somewhere aside for later investigation
(or log the file contents as well).

I can't think of any reason why Django's caching code is going to cause
this problem, since it's (allegedly) pickling valid data and then
unpickling the same data and I trust Python's pickling process to work.
However, it's probably also relevant to know *what* you are pickling
here. What type of object is it? Where did the data for that object come
from?

Regards,
Malcolm




--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to