Hi, There's a thread on clamav-users with the title "PhishingScanURLs is dreadfully slow/CPU-intensive" that I think developers should be aware of. Below is my latest post on the topic. Basically, I strongly urge the developers to make PhishingScanURLs default to off instead of on.
Regards, David. -------- Original Message -------- Subject: Re: [Clamav-users] PhishingScanURLs is dreadfully slow/CPU-intensive Date: Tue, 30 Oct 2007 11:15:21 -0400 From: David F. Skoll <[EMAIL PROTECTED]> To: ClamAV users ML <[EMAIL PROTECTED]> References: <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> Graham Toal wrote: > In fact with a decent string search algorithm (using a trie of > strings) there should be very little extra overhead in adding more > strings to be searched in parallel. PhishingScanURLs does not use string matching. It uses regexes, and in general regex matching is NP-hard (though I don't think Clam uses backreferences which are the worst culprits.) It also involves calls to cli_html_normalise which looks scary/expensive. cli_html_normalise is almost 1100 lines long and is filled with fixed-length buffer declarations. While that does not mean necessarily that it's a security risk, it still sends shivers up my spine. Nobody should be writing 1100-line functions! See libclamav/phishcheck.c and libclamav/htmlnorm.c for the code in question. > You're right in your assessment above. It should be simple and > lightweight. That doesn't rule out scanning for URLs in the body > text, it just means you have to do so efficiently, and IMHO using > regexps is not efficient and seldom justified. Exactly. :-) So the Clam people should not be using regexes. (Our customers, in fact, always run ClamAV in conjunction with an anti-spam scanner, so it's no benefit to them to have Clam try to do anti-spam.) Regards, David. _______________________________________________ http://lurker.clamav.net/list/clamav-devel.html Please submit your patches to our Bugzilla: http://bugs.clamav.net