Hello everybody, I've changed around a caching proxy for my own site's special needs. (It has lots of slow CGI's and BIG traffic and lots of people clicking reload.)
It works really well. Here are my experiences and hacks. > The mangled filenames are designed for speed and uniqueness, as well as > an even distribution across the filesystem - all performance issues. 1) MANUALLY EXPIRING Sometimes people want to write their own scripts to expire particular URLs. So I just linked to the apache library and make a small C executable that returns a mangled name given a URL as an argument. Then you can just delete the cached file for any URL's you want. 2) MANUALLY EXPIRING WITH RULES Sometimes people want to write a script that retires URLs that're based upon some RULE. (i.e. $url =~ /34.*oldies.*cgi?.*dog.*$/ ) In this case the solutions is to create an index of URL ==> mangled. If you have a limited number of URLs you can create the index with the executable in point 1. If you have an unlimited number of URLs (dynamic sites...), then you'll either have to crawl through the filesystem, or add a HOOK into the cache so that it adds a record to the index file each time it caches a file. > One way of doing this is - when the "special mode" is on - replace a 502 > Bad Gateway result from a conditional response with a 304 Not Modified. > This way when the browser or the intermediate proxy asks "is my cached > copy fresh" the Apache proxy will say "yes" - even though the backend > server is toasted and there is no way of being sure. 3) STOPPING PRAGMA NO-CACHE Sometimes URL's are so expensive to create for the backend that you NEED to use a "special mode" that ignores "Pragma no-cache" and other such headers. I just removed all of that code my self, it's not a big deal. But then the resulting proxy is only good for that one application. I didn't do it in general. 4) SIMULTANEOUS REQUESTS NOT CACHED Sometimes URL's are so expensive that your server can go DOWN in the time that it takes to generate a new content for the URL. For example, I have a CGI that takes 7 seconds to generate. In the time that it takes to generate that CGI 20 people can click and thereby kill the server. For this you need some sort of FLOCK or semaphore to make them all wait. I didn't implement this and instead generate the file with a program in the background. Alternatively you can set the server that responds to this to only handle one process, but then I think you'll get error messages to the client... 5) I WANT TO HELP By the way, I hope to contribute in someway to the proxy cache. I know C and CVS and am pretty meticulous. Unfortunately, I don't have huge amounts of time and have never worked on a group open source project before. Nevertheless, if someone needs something done maybe I can do it. Bye
