On 04/26/2012 08:37 PM, Michael Orlitzky wrote: > On 04/26/2012 10:32 AM, Dennis Peterson wrote: >> On 4/25/12 7:34 AM, Michael Orlitzky wrote: >>> On 04/25/12 07:55, Török Edwin wrote: >>>>> >>>>> I don't know if this can help speeding up the process but I collected >>>>> some statistics on >>>>> clamscan of a small file (wallclock duration: ~25sec): >>>> >>>> I think I'm missing some context here: which DB files are slow to load? >>>> The official ones? Just the sanesecurity ones? Any particular DB from the >>>> sanesecurity ones? >>> >>> My problem isn't so much that it takes a while to load the signatures, >>> but that clamd (and thus the mail server) is effectively down the entire >>> time. >> >> This has been a problem on every Sparc system I've ever installed ClamAV on >> and >> that goes back quite a few years. I still use in on several Netra 500 mHz >> pizza >> boxes. It is also quite a memory hole which is more related to the available >> memory and number of sigs, so on memory constrained systems I've cut back on >> the >> number of SS signatures. And at my peril, I might add, as they have long >> been >> the most valuable in terms of results. And because of the dead time when >> reloading I've cut freshclam to once a day. That has resulted in a net >> improvement in detections because of the higher availability time. >> > > The signature databases are created once, and loaded thousands of times. > They should just be sorted, so that lookups are instantaneous. > > Then it's trivial to update the databases in the background, because you > can quickly determine if a particular signature was added or deleted. > The wall-time-elapsed would be a bit worse, but nobody would care.
Its a bit more complicated than that. To ensure fast pattern-matching the signatures are loaded into an Aho-Corasick trie for example. It would be possible to add to the trie (thats what happens when loading signatures), but removing is more tricky. And to determine what to remove you need to go through all the signatures in the database anyway. Also updating the loaded signature database would require the scanning threads to take read locks, which would slow things down and make updating it harder (right now the loaded signature database is never modified, hence no locks are needed). It would be easier to just move reload_db to a different thread and allow scanning with the old database during the DB reload. Then when the DB reload is finished atomically replace the engine pointer and free the old engine. Downside would be that you get twice the memory usage during reload, but you don't have downtime, so this should probably be controlled by a flag in clamd.conf. https://bugzilla.clamav.net/show_bug.cgi?id=790#c14 Best regards, --Edwin _______________________________________________ Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net http://www.clamav.net/support/ml
