Hello everybody,

I've changed around a caching proxy for my own
site's special needs. (It has lots of slow CGI's
and BIG traffic and lots of people clicking reload.)

It works really well. Here are my experiences and hacks.


> The mangled filenames are designed for speed and uniqueness, as well as
> an even distribution across the filesystem - all performance issues.

1) MANUALLY EXPIRING
Sometimes people want to write their own scripts to expire
particular URLs. So I just linked to the apache library
and make a small C executable that returns a mangled name
given a URL as an argument. Then you can just delete the cached
file for any URL's you want.

2) MANUALLY EXPIRING WITH RULES
Sometimes people want to write a script that retires URLs
that're based upon some RULE. (i.e. $url =~ /34.*oldies.*cgi?.*dog.*$/ )
In this case the solutions is to create an index of URL ==> mangled.
If you have a limited number of URLs you can create the index
with the executable in point 1. If you have an unlimited number
of URLs (dynamic sites...), then you'll either have to crawl through
the filesystem, or add a HOOK into the cache so that it adds a record
to the index file each time it caches a file.





> One way of doing this is - when the "special mode" is on - replace a 502
> Bad Gateway result from a conditional response with a 304 Not Modified.
> This way when the browser or the intermediate proxy asks "is my cached
> copy fresh" the Apache proxy will say "yes" - even though the backend
> server is toasted and there is no way of being sure.
 

3) STOPPING PRAGMA NO-CACHE
Sometimes URL's are so expensive to create for the backend
that you NEED to use a "special mode" that ignores "Pragma no-cache"
and other such headers. I just removed all of that code my self,
it's not a big deal. But then the resulting proxy is only good
for that one application. I didn't do it in general.

4) SIMULTANEOUS REQUESTS NOT CACHED
Sometimes URL's are so expensive that your server can go DOWN
in the time that it takes to generate a new content for the URL.
For example, I have a CGI that takes 7 seconds to generate. In
the time that it takes to generate that CGI 20 people can click
and thereby kill the server. For this you need some sort of FLOCK
or semaphore to make them all wait. I didn't implement this and instead
generate the file with a program in the background. Alternatively
you can set the server that responds to this to only handle one
process, but then I think you'll get error messages to the client...

5) I WANT TO HELP
By the way, I hope to contribute in someway to the proxy cache.
I know C and CVS and am pretty meticulous. Unfortunately, I don't
have huge amounts of time and have never worked on a group open
source project before. Nevertheless, if someone needs something
done maybe I can do it.

Bye

Reply via email to