On 04/26/2012 08:37 PM, Michael Orlitzky wrote:
> On 04/26/2012 10:32 AM, Dennis Peterson wrote:
>> On 4/25/12 7:34 AM, Michael Orlitzky wrote:
>>> On 04/25/12 07:55, Török Edwin wrote:
>>>>>
>>>>> I don't know if this can help speeding up the process but I collected 
>>>>> some statistics on
>>>>> clamscan of a small file (wallclock duration: ~25sec):
>>>>
>>>> I think I'm missing some context here: which DB files are slow to load?
>>>> The official ones? Just the sanesecurity ones? Any particular DB from the 
>>>> sanesecurity ones?
>>>
>>> My problem isn't so much that it takes a while to load the signatures,
>>> but that clamd (and thus the mail server) is effectively down the entire
>>> time.
>>
>> This has been a problem on every Sparc system I've ever installed ClamAV on 
>> and 
>> that goes back quite a few years. I still use in on several Netra 500 mHz 
>> pizza 
>> boxes. It is also quite a memory hole which is more related to the available 
>> memory and number of sigs, so on memory constrained systems I've cut back on 
>> the 
>> number of SS signatures. And at my peril, I might add, as they have long 
>> been 
>> the most valuable in terms of results. And because of the dead time when 
>> reloading I've cut freshclam to once a day. That has resulted in a net 
>> improvement in detections because of the higher availability time.
>>
> 
> The signature databases are created once, and loaded thousands of times.
> They should just be sorted, so that lookups are instantaneous.
> 
> Then it's trivial to update the databases in the background, because you
> can quickly determine if a particular signature was added or deleted.
> The wall-time-elapsed would be a bit worse, but nobody would care.

Its a bit more complicated than that. To ensure fast pattern-matching the 
signatures are loaded into an Aho-Corasick trie for example.
It would be possible to add to the trie (thats what happens when loading 
signatures), but removing is more tricky.
And to determine what to remove you need to go through all the signatures in 
the database anyway.
Also updating the loaded signature database would require the scanning threads 
to take read locks, which would slow things down
and make updating it harder (right now the loaded signature database is never 
modified, hence no locks are needed).

It would be easier to just move reload_db to a different thread and allow 
scanning with the old database during the DB reload.
Then when the DB reload is finished atomically replace the engine pointer and 
free the old engine.
Downside would be that you get twice the memory usage during reload, but you 
don't have downtime,
so this should probably be controlled by a flag in clamd.conf.

https://bugzilla.clamav.net/show_bug.cgi?id=790#c14

Best regards,
--Edwin
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://www.clamav.net/support/ml

Reply via email to