[Paste] ETags with MD5?

Alec Flett Fri, 10 Aug 2007 12:23:39 -0700

Hey -
We've been using StaticURLParser to serve up our static files (under pylons)
but we've run into a few issues.


One of the biggest is the ETags header. The problem is that we're deploying
across a bunch of machines in a cluster, and static files get pulled out of
SVN, which timestamps the files with the date as it's checked out. This
means that each machine has a different etag for each file, because each
file has a different date.

Now one option is to fix our deployment to go re-touch the files after
they've been checked out, to match their timestamps in svn.

But another option is to do md5-based etagging... so it's really the
contents of the file, not the date that happens to be on disk.

This brings up a few problems with StaticURLParser/fileapp:
- Headers are not really configurable. Frankly a last ditch effort would be
to stop including the ETag header altogether, but Paste makes this
extraordinarily hard - I can monkeypatch it but it's really not pretty. I
can make middleware, but it seems like overkill to write middleware just to
remove a header
- Really, if the ETag was the MD5 of the file, then the etag would be
consistent across the cluster. This technique is described here:
http://dev.aol.com/implementing-atom-publishing-protocol-python-wsgi

The trick with doing MD5 is how/when do you calculate the MD5 hash to
compare it to If-None-Match? Clearly MD5 hashing is more expensive than just
stat()ing a file. I can think of a few possibilities:

- an in-memory cache mapping resources -> hashes- calculate the md5 hash
when you serve the file for the first time, and remember it after that

- hash the file unconditionally - if you assume that your request is
ultimately bound more by network traffic than the cost of reading it off
disk, then it's still cheaper to pass the whole file through RAM and
Not-Modified than it would be to serve the whole file over the network.. but
if you actually end up serving it (i.e. the etag doesn't match) then it's
going to be hard not to read the file a second time in order to serve it up.

- store the md5 hash persistently somewhere. Perhaps just by appending .md5
to the filename - if it exists, assume it's the right MD5 hash

Anyway, obviously this is tricky, but I'm curious if anyone else has tackled
this issue, or if anyone would consider adding some kind of MD5 support like
those listed above to Paste/StaticURLParser/fileapp - i.e. what if you could
make StaticURLParser just look for a .md5 file on disk, and used it if it
found it, and otherwise use the current mechanism?

Alec

_______________________________________________
Paste-users mailing list
[email protected]
http://webwareforpython.org/cgi-bin/mailman/listinfo/paste-users

[Paste] ETags with MD5?

Reply via email to