From a best practices perspective it is best to use freshclam when talking to ClamAV resources. Once you have what you need from them you can do anything you like internally. You don't have to be nice to them at this point. I had a couple hundred RedHat servers to manage and they all required scanning software because of the industry I was in and because of HIPPA, credit card, social security, phone numbers and other personal information rules we were bound to. I created a lot of locally generated signatures to look for this information. This was before smart file systems that would do this for us.

When I built the local private mirror I used the cdiff files (scripted downloads were permitted) to create local patched .cld files. These had to be distributed to the hundreds of other machines and for that I initially used rsync because it is just bullet proof, and later I moved it all to CFengine (predecessor to puppet, chef).

The CFengine master server received the cld files from a snapshot file system (freshclam triggered the snapshot before and after an update) so new updates would not corrupt existing signature files, and it then immediately informed all the clients they had work to do to become conformal (in CFengine  terms). CFengine is smart enough to know to transmit differences between local and remote files on the fly (rsync) so net traffic is minimized as the daily and bytecode files don't change much. And because of the way the process works (creating hidden files until the differences are resolved), the hidden files are renamed to the original names which is very close to an atomic operation, so problems working with files in transport were prevented. The CFengine client would notify the local clamd instance when the files were ready. Clamd has to be told not to reload when it detects signature change. All very clean, fast, and secure owing to using secure processes at each step and hands-free on my part. It also passed federal government security audits which was the best part.

Short answer - don't use freshclam to get the signature files from your mirror to your clients and it won't matter if they are cld, cvd, cud, etc., and it doesn't burden the ClamAV servers by pulling full copies of CVD files.

As for the cdiff files not changing, that is by design because each cdiff file brings the local cld file to the cdiff version, and because it can't be known how many cdiffs have been created between user updates, they are retained for a period of time and freshclam applies them in order until the final cdiff matches the current DNS TXT record.


On 12/14/18 6:58 PM, Paul Kosinski wrote:
The Good Deed

When we started using ClamAV, we wanted to distribute the database
to the several machines on our LAN in order to reduce the load on the
volunteer servers and minimize the load on our old DSL (now gone). The
best way to do this, it seemed, was to set up a trivial HTTP server to
mirror and deliver the new files. And, of course, they had to be cvd
files which, according to the FAQ, precluded "Scripted Updates" and the
much smaller cdiff files.

The Punishment

This all worked quite well until ClamAV switched to distributing the
updates via Cloudflare: then The Delays started. The Delays initially
exhibited themselves when freshclam itself(!) found that the DNS TXT
record said that a new daily.cvd was available but upon trying to
retrieve it freshclam failed, complaining about network problems. This
eventually would cause all the mirrors to be disabled.

After much investigation (documented at length in previous posts) I
noticed that the daily.cvd from the BOS Cloudflare server was often far
behind that from the IAD Cloudflare server (which always seemed to
match the DNS TXT advertisement). I began to suspect that this was
perhaps caused by a caching web proxy, probably a transparent one
"helpfully" interposed by Comcast.

While all this was going on, Joel stated that nobody else was having
(or at least reporting) these Delay problems.

Now I think I know why.

The Explanation

Most everybody (I would guess) uses the Scripted Update feature, which
is enabled by default. So, I ran an experiment. On one machine I
bypassed local mirroring, enabled Scripted Update *and* captured the
HTTP traffic to/from Cloudflare via dumpcap. What I found was that
Scripted Update does HTTP GETs for one or more daily-12345.cdiff
files in sequence, each, presumably, updating "daily" from the
numerically previous version.

Now it became clear! Each daily-12345.cdiff *always* has the same
content, no matter when it is retrieved. The content of daily.cvd, on
the other hand varies over time. That makes *any* caching of daily.cvd
files susceptible to cause versioning problems, whereas the cdiff files
(such as daily-12345.cdiff) are totally invulnerable to any caching
whatsoever: web caches work according to file *name*, not file content.

This problem is exacerbated by the fact that the Cloudflare servers
seem to add a "Cache-Control:" HTTP header that does NOT specify
"no-cache". (I don't know what the old "volunteer" servers did in this

The upshot of this is that the Scripted Update mechanism will *never*
get out-of-date cdiff files, although it may experience a short delay
if it's the first requester of a new cdiff.

The local mirror mechanism, on the other hand is almost guaranteed to
fail on occasion -- or at least suffer arbitrary delays -- if there is a
caching proxy in its path to Cloudflare. Even if the Cloudflare servers
used a "Cache-Control: no-cache" header, there might be a rogue proxy
in the way that ignores this header, and caches anyway. (AFAIK, there
is no way to enforce "no-cache".)

So what could be done to avoid the problem?

One possibility is to give up on local mirrors. But that might increase
the load on the Cloudflare servers, as some installations might have
more local ClamAV clients than the ratio of the size of a full cvd to
the size of a typical cdiff.

A solution to that would be to use a local HTTP proxy to distribute the
cdiff files to all the ClamAV installations on the LAN. (But that would
require rather complicated setup.)

A third approach would be to do the mirroring using the cdiff-generated
cld files rather than with cvd files. I don't know what changes to
freshclam this would require. One possible obstacle to doing this is
whether the cld files are or could be cryptographically signed like the
cvds are. Something like that would likely be necessary for enterprise
security. (Presumably, generating Talos-signed cvds locally from the
clds would be a really bad idea, while setting up private PKI for local
signing would be a really big pain.)

A fourth, and I think very simple, approach would be to name cvds like
the cdiffs are named. In other words, instead of having daily.cvd, one
would have daily-12345.cvd, followed by daily-12346.cvd as the next
update. This would be impervious to the vagaries of caching. I also
think it would require only fairly trivial code changes to freshclam
and whatever component of ClamAV it is that (re)loads the database.
(All that would be necessary would be to always use the cvd with the
highest version number.)

Any thoughts on all this? Is local mirroring still possible?


P.S. I would have thought that since the clds are much bigger than the
corresponding cvds, loading a cld into memory would be slower than
loading the equivalent cvd, but this seems not to be the case.

To measure the load time I ran clamscan on one tiny file using the
daily.cvd version of the signatures and then using the much bigger
daily.cld. (Main.cvd remained as itself.) This was done on an fairly old
machine, and before each run I, of course, did:

   echo 1 > /proc/sys/vm/drop_caches

The result was that total real (and 'user') times were slightly less
for the cld, although the 'system' time was slightly more. I wonder what
eats up the extra time. (I thought disks were always supposed to be the
bottleneck for simple computations like decompression and crypto.)
clamav-users mailing list

Help us build a comprehensive ClamAV guide:

clamav-users mailing list

Help us build a comprehensive ClamAV guide:

Reply via email to