Hello,

I have some questions regarding Django.middleware.cache, which is impressive 
feature of Django. I am trying to understand it better but some design 
decisions are unclear to me. I am sure you had your reasons to do it that 
way and I want to know them. ;-)

1) Prefix.

All cache keys are prefixed with 'views.decorators.cache.cache_page.'. It is 
rather long prefix. Why didn't you use something shorter?

It looks like Django's cache objects were tiny part of some huge cache => 
you worried about some potential clash. Is it the original design 
restriction? Do you have any rational explanation why we have it in Django 
now?

2) Gzip.

Gzip flag is part of cache key. Additionally 'Content-Encoding' = 'gzip' is 
part of response object. If somebody request a page, but cannot accept 
gzip-encoded content, it is trivial to ungzip it. And visa versa: we can 
gzip uncompressed data. It can be exploited using several strategies:

a) Active: every time we put something in cache, we put 2 objects: gzipped, 
and uncompressed. Cons: extra work + extra cache space, if nobody wants on 
of generated versions.

b) Passive: we keep one version in the cache, and generate the counterpart 
dynamically. Cons: extra work, which can be compounded if we have "wrong" 
version in the cache.

c) Passive-aggressive: this is variation of b. We always keep compressed 
version in cache saving space and transfer time. Practically all modern 
browsers accept compressed content (if I remember correctly, Opera is 
notable exception). For the rest of them we will uncompress on the fly. I 
hope it will be rather rare event.

d) Lazy: this is variation of a. If we have some version in cache and its 
counterpart was requested, we generate it from cached version and save it in 
the cache as well.

b and d may require extra lookup => has_key() should be implemented 
efficiently. It better be. As far as I can tell it's not a case for 
memcache. But nevertheless it can shave off time for expensive requests.

Why did you decide to implement multi-component key, which has gzip flag as 
the last component? Why did you decide to _generate_ content by full Django 
machinery independently for gzipped and uncompressed versions?

3) 404s.

I've noticed that all responses are cached, including responses with status 
code 404, and everything returned from Django. Was it the original intention 
to cache 404s? Why? Was it too expensive to recheck and too frequent to 
ignore? What about the rest of non-200 codes?

I hope someone will educate me on these issues. Thank you in advance,

Eugene




Reply via email to