On 2008-12-03 03:02, Thomasz Blaszczyk wrote: > Hi, > Hi,
> I am new to CLAMAV & I am just wonder how files are scanned. > > Does it work like: > 1. PE section is taken from file to be scanned > It is much more than that, ClamAV can also process a variety of archive formats, containers, and executable packers. Also PE files aren't the only malware files, you can have malware in scripts too. Have a look at filetypes_int.h for the file types we support. New file type definitions can be added via database updates. > 2. MD5 is calculated > Correct, but ClamAV also uses a pattern matcher (Aho-Corasick and extended version of Boyer-Moore), not only MD5. See signatures.pdf for the kind of patterns it supports (in particular it supports wildcards with AC matcher). So ClamAV actually tries to match those patterns inside the file. It also has some heuristic and algorithmic detections. There is an MD5 calculated for the entire file, and MD5 calculated per PE section too. > 3. That MD5 is compared to all signatures in ClamAV Database > Using a BM matcher, yes. Not sequentially. > 4. If match virus is found. > Yes. > I have simplified this. But please let me know if I am right in above > steps for scanning files. If you only have a database with md5 loaded, and disable archives, and disable algorithmic scans, and heuristics, and disable html, mbox formats, then yes ;) In practice, ClamAV does much more than just matching an MD5. Best regards, --Edwin _______________________________________________ http://lurker.clamav.net/list/clamav-devel.html Please submit your patches to our Bugzilla: http://bugs.clamav.net