Neil Gunton wrote:
Neil Gunton wrote:
Neil Gunton wrote:
It seems like this might have something to do with mod_deflate, which
I am using in combination with mod_disk_cache. This page gives a clue
that there might be a problem with the way files are cached when these
modules are both enabled:
http://www.digitalsanctuary.com/tech-blog/general/apache-mod_deflate-and-mod_cache-issues.html
I have just been doing some experimentation on my development
workstation. It seems that with mod_deflate enabled, mod_cache doesn't
cache properly, or at least not as I would expect: I tested with two
browsers (Mozilla and Opera), both with no cookies related the site, and
loading the same page from each. Both requests were passed through to
the back-end, i.e. were cached separately. This is with mod_deflate
enabled for html pages. So I disabled mod_deflate (just commented out
that one line), restarted the servers, cleared the caches of both
browsers and mod_cache, and tried again. This time, the first request
was passed through to the backend (as expected), but the second request,
from the other browser for the same page, was this time retrieved from
mod_cache. Also, the cache directories on the server end look a lot
simpler, I guess because the Vary header is no longer being set by
mod_deflate. This is very interesting, I'm going to do some more testing
on the production server, by clearing the mod_disk_cache cache and
disabling mod_deflate for a while to see how things run. I know the
content transmitted will be larger and thus slower for people on slow
connections, but right now I'm interested in seeing how this affects the
performance of htcacheclean, and even du - see if times for traversing
the directories gets much better without all those extra Vary subdirs.
In any case, it would seem that the cache wasn't really working after
all, which might explain the large number of cache directories -
multiple versions of the same content. Yikes.
Well, that seemed to do the trick! So the caveat seems to be: Be careful
using both mod_deflate and mod_cache (mod_disk_cache specifically)
together if you have a large dynamic website that can generate a large
number of distinct pages. Mod_deflate produces a Vary header, which
forces mod_cache to store multiple versions of the same content. To
compound this, every version involves additional subdirs in the cache,
which makes traversing it in any fashion very, very time consuming,
producing high iowait even for a fast 4 disk SCSI RAID0 setup.
It took more than three hours just to delete the old cache.
Once I disabled mod_deflate, the new cache looks a lot cleaner - just
the three levels of directory that I specified in the config via
CacheDirLevels, and none of the extra .vary sub-levels.
Additionally, du now just takes a few seconds to traverse the cache,
which currently is set at 1GB. Htcacheclean seems to be keeping up well
in daemon mode, with -i -n options. The large, ongoing iowait on the
server has disappeared completely.
Web pages seem to render a little faster in the browser too. That may be
my imagination and/or placebo effect, but it might make sense if there
isn't that additional compression/decompression going on both ends.
The only downside is that people on extremely slow dialup connections
might notice longer download times for page text... but I have to wonder
if that's really an issue today. Back in 1998 perhaps you might care
about something being 20KB rather than 80KB, but surely not today. In
any case, don't dialup ISPs often implement their own compression now?
Anyway, hope that's helpful to anybody running large dynamic websites
behind a reverse proxy. Keep mod_cache, maybe think about ditching
mod_deflate. The combination does technically work, but for large
numbers of pages, it can make your cache size (and your iowait) explode.
Neil