Neil Gunton wrote:
Neil Gunton wrote:
Neil Gunton wrote:
It seems like this might have something to do with mod_deflate, which I am using in combination with mod_disk_cache. This page gives a clue that there might be a problem with the way files are cached when these modules are both enabled:

http://www.digitalsanctuary.com/tech-blog/general/apache-mod_deflate-and-mod_cache-issues.html

I have just been doing some experimentation on my development workstation. It seems that with mod_deflate enabled, mod_cache doesn't cache properly, or at least not as I would expect: I tested with two browsers (Mozilla and Opera), both with no cookies related the site, and loading the same page from each. Both requests were passed through to the back-end, i.e. were cached separately. This is with mod_deflate enabled for html pages. So I disabled mod_deflate (just commented out that one line), restarted the servers, cleared the caches of both browsers and mod_cache, and tried again. This time, the first request was passed through to the backend (as expected), but the second request, from the other browser for the same page, was this time retrieved from mod_cache. Also, the cache directories on the server end look a lot simpler, I guess because the Vary header is no longer being set by mod_deflate. This is very interesting, I'm going to do some more testing on the production server, by clearing the mod_disk_cache cache and disabling mod_deflate for a while to see how things run. I know the content transmitted will be larger and thus slower for people on slow connections, but right now I'm interested in seeing how this affects the performance of htcacheclean, and even du - see if times for traversing the directories gets much better without all those extra Vary subdirs. In any case, it would seem that the cache wasn't really working after all, which might explain the large number of cache directories - multiple versions of the same content. Yikes.

Well, that seemed to do the trick! So the caveat seems to be: Be careful using both mod_deflate and mod_cache (mod_disk_cache specifically) together if you have a large dynamic website that can generate a large number of distinct pages. Mod_deflate produces a Vary header, which forces mod_cache to store multiple versions of the same content. To compound this, every version involves additional subdirs in the cache, which makes traversing it in any fashion very, very time consuming, producing high iowait even for a fast 4 disk SCSI RAID0 setup.

It took more than three hours just to delete the old cache.

Once I disabled mod_deflate, the new cache looks a lot cleaner - just the three levels of directory that I specified in the config via CacheDirLevels, and none of the extra .vary sub-levels.

Additionally, du now just takes a few seconds to traverse the cache, which currently is set at 1GB. Htcacheclean seems to be keeping up well in daemon mode, with -i -n options. The large, ongoing iowait on the server has disappeared completely.

Web pages seem to render a little faster in the browser too. That may be my imagination and/or placebo effect, but it might make sense if there isn't that additional compression/decompression going on both ends.

The only downside is that people on extremely slow dialup connections might notice longer download times for page text... but I have to wonder if that's really an issue today. Back in 1998 perhaps you might care about something being 20KB rather than 80KB, but surely not today. In any case, don't dialup ISPs often implement their own compression now?

Anyway, hope that's helpful to anybody running large dynamic websites behind a reverse proxy. Keep mod_cache, maybe think about ditching mod_deflate. The combination does technically work, but for large numbers of pages, it can make your cache size (and your iowait) explode.

Neil

Reply via email to