On 05/23/2010 05:12 PM, Karsten Bräckelmann wrote:
> On Sun, 2010-05-23 at 10:21 +0300, Török Edwin wrote:
>>> else
>>>     Scan it like it does now
>>>     ( with everything in the DB, I assume. )
>>> }
>>
>> A simpler form of this is already implemented in 0.96 :)
>>
>> If a file is determined to be clean, its MD5 is added to an in-memory cache.
>> When scanning a new file, its MD5 is computed and looked up in the
>> cache. If found, it is considered clean.
>> On DB reload the entire cache is cleared.
> 
> But, isn't that typically done multiple times a day?
> 
> So what exactly is the use-case for this, other than doing full system
> scans more frequently than signature updates?

Even when doing full systems scan you still have a cache of last N
minutes (where N depends how often you reload the DB).
This helps with:
 - duplicate files, or files both in archived an unarchived state
 - since we cache at the extracted files level, even if only part of an
archive/container is redundant, we have that cached
 - mails containing same attachment, which was already determined to be
clean
 - archive bombs: instead of trying to scan 2^N files until the
recursion depth/maxfilesize limit is reached, it only needs to scan N
files (N is recursion depth) for a typical archive bomb that expands to
2 more archives at each depth.
 - ensure that the bytecode won't accidentally need 2^N time to run: if
it happens to extract a file that matches the logical signature of the
same bytecode again, which would trigger further extraction and so on

The latter is the reason why the feature was added, however some initial
tests have showed improved performance for nearly any kind of scan
(system, mails, home, etc.)

You can try it for yourself, do a normal scan with ClamAV as is, then
comment out the call to cache_check(), and measure again.

Best regards,
--Edwin
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://www.clamav.net/support/ml

Reply via email to