The Good Deed When we started using ClamAV, we wanted to distribute the database to the several machines on our LAN in order to reduce the load on the volunteer servers and minimize the load on our old DSL (now gone). The best way to do this, it seemed, was to set up a trivial HTTP server to mirror and deliver the new files. And, of course, they had to be cvd files which, according to the FAQ, precluded "Scripted Updates" and the much smaller cdiff files.
The Punishment This all worked quite well until ClamAV switched to distributing the updates via Cloudflare: then The Delays started. The Delays initially exhibited themselves when freshclam itself(!) found that the DNS TXT record said that a new daily.cvd was available but upon trying to retrieve it freshclam failed, complaining about network problems. This eventually would cause all the mirrors to be disabled. After much investigation (documented at length in previous posts) I noticed that the daily.cvd from the BOS Cloudflare server was often far behind that from the IAD Cloudflare server (which always seemed to match the DNS TXT advertisement). I began to suspect that this was perhaps caused by a caching web proxy, probably a transparent one "helpfully" interposed by Comcast. While all this was going on, Joel stated that nobody else was having (or at least reporting) these Delay problems. Now I think I know why. The Explanation Most everybody (I would guess) uses the Scripted Update feature, which is enabled by default. So, I ran an experiment. On one machine I bypassed local mirroring, enabled Scripted Update *and* captured the HTTP traffic to/from Cloudflare via dumpcap. What I found was that Scripted Update does HTTP GETs for one or more daily-12345.cdiff files in sequence, each, presumably, updating "daily" from the numerically previous version. Now it became clear! Each daily-12345.cdiff *always* has the same content, no matter when it is retrieved. The content of daily.cvd, on the other hand varies over time. That makes *any* caching of daily.cvd files susceptible to cause versioning problems, whereas the cdiff files (such as daily-12345.cdiff) are totally invulnerable to any caching whatsoever: web caches work according to file *name*, not file content. This problem is exacerbated by the fact that the Cloudflare servers seem to add a "Cache-Control:" HTTP header that does NOT specify "no-cache". (I don't know what the old "volunteer" servers did in this regard.) The upshot of this is that the Scripted Update mechanism will *never* get out-of-date cdiff files, although it may experience a short delay if it's the first requester of a new cdiff. The local mirror mechanism, on the other hand is almost guaranteed to fail on occasion -- or at least suffer arbitrary delays -- if there is a caching proxy in its path to Cloudflare. Even if the Cloudflare servers used a "Cache-Control: no-cache" header, there might be a rogue proxy in the way that ignores this header, and caches anyway. (AFAIK, there is no way to enforce "no-cache".) So what could be done to avoid the problem? One possibility is to give up on local mirrors. But that might increase the load on the Cloudflare servers, as some installations might have more local ClamAV clients than the ratio of the size of a full cvd to the size of a typical cdiff. A solution to that would be to use a local HTTP proxy to distribute the cdiff files to all the ClamAV installations on the LAN. (But that would require rather complicated setup.) A third approach would be to do the mirroring using the cdiff-generated cld files rather than with cvd files. I don't know what changes to freshclam this would require. One possible obstacle to doing this is whether the cld files are or could be cryptographically signed like the cvds are. Something like that would likely be necessary for enterprise security. (Presumably, generating Talos-signed cvds locally from the clds would be a really bad idea, while setting up private PKI for local signing would be a really big pain.) A fourth, and I think very simple, approach would be to name cvds like the cdiffs are named. In other words, instead of having daily.cvd, one would have daily-12345.cvd, followed by daily-12346.cvd as the next update. This would be impervious to the vagaries of caching. I also think it would require only fairly trivial code changes to freshclam and whatever component of ClamAV it is that (re)loads the database. (All that would be necessary would be to always use the cvd with the highest version number.) Any thoughts on all this? Is local mirroring still possible? Paul P.S. I would have thought that since the clds are much bigger than the corresponding cvds, loading a cld into memory would be slower than loading the equivalent cvd, but this seems not to be the case. To measure the load time I ran clamscan on one tiny file using the daily.cvd version of the signatures and then using the much bigger daily.cld. (Main.cvd remained as itself.) This was done on an fairly old machine, and before each run I, of course, did: echo 1 > /proc/sys/vm/drop_caches The result was that total real (and 'user') times were slightly less for the cld, although the 'system' time was slightly more. I wonder what eats up the extra time. (I thought disks were always supposed to be the bottleneck for simple computations like decompression and crypto.) _______________________________________________ clamav-users mailing list clamav-users@lists.clamav.net http://lists.clamav.net/cgi-bin/mailman/listinfo/clamav-users Help us build a comprehensive ClamAV guide: https://github.com/vrtadmin/clamav-faq http://www.clamav.net/contact.html#ml