On Tue, 24 Oct 2006, Graham Leggett wrote:
* Allow disk cache to realise that a (large) file is the same
regardless of which URL is used to access it. Reduces cache disk
usage a lot for sites like ours that's known by ftp.acc.umu.se,
ftp.se.debian.org, ftp.gnome.org, se.releases.ubuntu.com,
releases.mozilla.org and so on.
Perhaps this could be as simple as using ServerName and ServerAlias
(unless the name of the site is part of the URL, which will happen in the
forward proxy case) to reduce the cached URL to a canonical form before
storing and or retrieving from the cache.
We have a few different servernames depending on which site it's
serving (needs to cater for official download locations and so on) so
I guess that won't help much.
* Add option to not try to remove cache directories in the cache
structure. IMHO, this should never be needed since the cache
directory should not be excessively deep (which the broken defaults
leads to). Davi had a fix for the cache dir layout I think, and I
personally think that neither mod_disk_cache nor htcacheclean should
do rmdir.
It makes sense that mod_disk_cache shouldn't do it, but perhaps it should
be tunable for htcacheclean.
Arguably. But if you ever need to remove directories in the cache
hiearchy you should really start to wonder why they were created in
the first place...
* Eventually add option to have header and body in the same cachefile.
Is there an advantage to this? IIRC Brian reported that a body in a
separate file can take advantage of sendfile, as is as a result much
faster.
We use combined header/body, and sendfile works flawlessly. Linux
sendfile has problems when writing to a sendfile():d file with
mmap, and all sendfiles have problems with overlapping
sendfile/writes.
The main advantage is half the number of inodes and that by removing
one file you get rid of both the header and body. I suspect that the
performance gain is minimal though.
A more formal cache cleanup process needs to be fleshed out, giving the
options above both as options in code, and as documentation as you say.
The comparison of your and Brian's experience are two ends of extremes on
high volume caches, one low hits large files, the second high hits small
files. This should make for some useful tuning information.
The extreme difference is what makes me think that we should
acknowledge that they exist and provide the relevant knobs where
necessary. As it looks right now, those knobs tend to be more
OS/filesystem specific, but that might change as this evolves.
/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED]
---------------------------------------------------------------------------
Buy a 486-33 you can reboot faster..
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=