On Mon, 26 Jan 2009, Tom Shaw wrote:
* 0 = any ?le
* 1 = Portable Executable
* 2 = OLE2 component (e.g. a VBA script)
* 3 = HTML (normalised)
* 4 = Mail file
* 5 = Graphics
* 6 = ELF
* 7 = ASCII text ?le (normalised)
but how does clamd tell what kind of file it is
so it knows what rule types need to be run? If
its a "mail file" does it automatically deal with
attachment and mime types and character sets?
I assume Clam looks at the MIME types of message parts to determine what
sort of signature to use. Perhaps it uses "file" type heuristics too. At
any rate, it splits the messages up into its parts, decodes and
normalises them, and and saves them in individual temp files, which it
then scans.
Save the message you want to create a signature for to some temp
directory. Edit it to make sure there's no leading extraneous stuff such
as fromspace lines; it should start with the Return-Path: or Received:
headers.
Run
clamscan --tempdir=. --leave-temps filename
where "filename" is where you saved the message to. You should now find
clam's temp files hanging about; a set of files
or directories called "clamav-" plus a long hex string, e.g.
clamav-95bb346e26fab14ffc15e577fdb19543
These represent the message's MIME parts, normalised. A text/plain part
will be represented by a plain temp file; the text will have upper case
mapped to lower, runs of white space (including line breaks) mapped to
single spaces, and 8 bit characters elided. Text in these files can be
matched with a type 7 signature:
SigName:7:*:hexsig
A text/html part will be represented by a directory containing:
nocomment.html - the HTML normalised as for plain text, with the HTML
comments stripped but other tags intact
notags.html - as above but all tags stripped
rfc2397 - a directory that is usually empty. I don't think I have ever
seen a "data:" URL in real life.
Text in nocomment.html and notags.html can be matched with a type 3
signature.
SigName:3:*:hexsig
A type 4 signature can be used to match text in the original mail file.
This is not normalised so you have to match any line breaks and white
space exactly. Less forgiving than 3 or 7, but you can match headers
with this, or anchor text to line endings:
Local.zoosextour:4:*:0a0a687474703a2f2f{-50}2f7a6f6f736578746f75720a0a
Occasionally you can use other types. Type 2 will match an Office
document. The only time I have used this was to match an attached
spreadsheet which contained an ad for a pills website.
As for the hex sig itself, cut and paste text from the temp file into
"sigtool --hex-dump" and paste the output onto the end of the sig. Put
your sig into a file in the local directory with a name ending in ".ndb"
- say local.ndb - then run clamscan again to see if it matches
$ cat local.ndb
Local.zoosextour:4:*:0a0a687474703a2f2f{-50}2f7a6f6f736578746f75720a0a
$ clamscan -d . filename
filename: Local.zoosextour.UNOFFICIAL FOUND
----------- SCAN SUMMARY -----------
Known viruses: 248
Engine version: 0.94.2
Scanned directories: 0
Scanned files: 1
Infected files: 1
Data scanned: 0.00 MB
Time: 0.361 sec (0 m 0 s)
I hope that helps
Scott Larnach, Edinburgh UniversityThe University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://www.clamav.net/support/ml