On 05/23/2010 05:12 PM, Karsten Bräckelmann wrote: > On Sun, 2010-05-23 at 10:21 +0300, Török Edwin wrote: >>> else >>> Scan it like it does now >>> ( with everything in the DB, I assume. ) >>> } >> >> A simpler form of this is already implemented in 0.96 :) >> >> If a file is determined to be clean, its MD5 is added to an in-memory cache. >> When scanning a new file, its MD5 is computed and looked up in the >> cache. If found, it is considered clean. >> On DB reload the entire cache is cleared. > > But, isn't that typically done multiple times a day? > > So what exactly is the use-case for this, other than doing full system > scans more frequently than signature updates?
Even when doing full systems scan you still have a cache of last N minutes (where N depends how often you reload the DB). This helps with: - duplicate files, or files both in archived an unarchived state - since we cache at the extracted files level, even if only part of an archive/container is redundant, we have that cached - mails containing same attachment, which was already determined to be clean - archive bombs: instead of trying to scan 2^N files until the recursion depth/maxfilesize limit is reached, it only needs to scan N files (N is recursion depth) for a typical archive bomb that expands to 2 more archives at each depth. - ensure that the bytecode won't accidentally need 2^N time to run: if it happens to extract a file that matches the logical signature of the same bytecode again, which would trigger further extraction and so on The latter is the reason why the feature was added, however some initial tests have showed improved performance for nearly any kind of scan (system, mails, home, etc.) You can try it for yourself, do a normal scan with ClamAV as is, then comment out the call to cache_check(), and measure again. Best regards, --Edwin _______________________________________________ Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net http://www.clamav.net/support/ml
