On Mon, 26 Jan 2009, Tom Shaw wrote:

* 0 = any ?le
* 1 = Portable Executable
* 2 = OLE2 component (e.g. a VBA script)
* 3 = HTML (normalised)
* 4 = Mail file
* 5 = Graphics
* 6 = ELF
* 7 = ASCII text ?le (normalised)

but how does clamd tell what kind of file it is
so it knows what rule types need  to be run?  If
its a "mail file" does it automatically deal with
attachment and mime types and character sets?

I assume Clam looks at the MIME types of message parts to determine what sort of signature to use. Perhaps it uses "file" type heuristics too. At any rate, it splits the messages up into its parts, decodes and normalises them, and and saves them in individual temp files, which it then scans.

Save the message you want to create a signature for to some temp directory. Edit it to make sure there's no leading extraneous stuff such as fromspace lines; it should start with the Return-Path: or Received: headers.

Run

  clamscan --tempdir=. --leave-temps filename

where "filename" is where you saved the message to. You should now find clam's temp files hanging about; a set of files or directories called "clamav-" plus a long hex string, e.g.

  clamav-95bb346e26fab14ffc15e577fdb19543

These represent the message's MIME parts, normalised. A text/plain part will be represented by a plain temp file; the text will have upper case mapped to lower, runs of white space (including line breaks) mapped to single spaces, and 8 bit characters elided. Text in these files can be matched with a type 7 signature:

  SigName:7:*:hexsig

A text/html part will be represented by a directory containing:

  nocomment.html - the HTML normalised as for plain text, with the HTML
  comments stripped but other tags intact

  notags.html - as above but all tags stripped

  rfc2397 - a directory that is usually empty. I don't think I have ever
  seen a "data:" URL in real life.

Text in nocomment.html and notags.html can be matched with a type 3 signature.

  SigName:3:*:hexsig

A type 4 signature can be used to match text in the original mail file. This is not normalised so you have to match any line breaks and white space exactly. Less forgiving than 3 or 7, but you can match headers with this, or anchor text to line endings:

 Local.zoosextour:4:*:0a0a687474703a2f2f{-50}2f7a6f6f736578746f75720a0a

Occasionally you can use other types. Type 2 will match an Office document. The only time I have used this was to match an attached spreadsheet which contained an ad for a pills website.

As for the hex sig itself, cut and paste text from the temp file into "sigtool --hex-dump" and paste the output onto the end of the sig. Put your sig into a file in the local directory with a name ending in ".ndb" - say local.ndb - then run clamscan again to see if it matches

  $ cat local.ndb
  Local.zoosextour:4:*:0a0a687474703a2f2f{-50}2f7a6f6f736578746f75720a0a
  $ clamscan -d . filename
  filename: Local.zoosextour.UNOFFICIAL FOUND

  ----------- SCAN SUMMARY -----------
  Known viruses: 248
  Engine version: 0.94.2
  Scanned directories: 0
  Scanned files: 1
  Infected files: 1
  Data scanned: 0.00 MB
  Time: 0.361 sec (0 m 0 s)

I hope that helps

Scott Larnach, Edinburgh University
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://www.clamav.net/support/ml

Reply via email to