On 2009-01-27 15:16, [email protected] wrote:
> On Mon, 26 Jan 2009, Tom Shaw wrote:
>
>> * 0 = any �le
>> * 1 = Portable Executable
>> * 2 = OLE2 component (e.g. a VBA script)
>> * 3 = HTML (normalised)
>> * 4 = Mail file
>> * 5 = Graphics
>> * 6 = ELF
>> * 7 = ASCII text �le (normalised)
>>
>> but how does clamd tell what kind of file it is
>> so it knows what rule types need  to be run?  If
>> its a "mail file" does it automatically deal with
>> attachment and mime types and character sets?
>
> I assume Clam looks at the MIME types of message parts to determine
> what sort of signature to use. Perhaps it uses "file" type heuristics
> too. At any rate, it splits the messages up into its parts, decodes
> and normalises them, and and saves them in individual temp files,
> which it then scans.

Look at daily.ftm, it contains the rules to detect file type. The
distinction between binary and text is made by the engine.

>
> Save the message you want to create a signature for to some temp
> directory. Edit it to make sure there's no leading extraneous stuff
> such as fromspace lines; it should start with the Return-Path: or
> Received: headers.
>
> Run
>
>   clamscan --tempdir=. --leave-temps filename
>
> where "filename" is where you saved the message to. You should now
> find clam's temp files hanging about; a set of files or directories
> called "clamav-" plus a long hex string, e.g.
>
>   clamav-95bb346e26fab14ffc15e577fdb19543
>
> These represent the message's MIME parts, normalised. A text/plain
> part will be represented by a plain temp file; the text will have
> upper case mapped to lower, runs of white space (including line
> breaks) mapped to single spaces, and 8 bit characters elided.

Also control characters are elided.

> Text in these files can be matched with a type 7 signature:
>
>   SigName:7:*:hexsig
>
> A text/html part will be represented by a directory containing:
>
>   nocomment.html - the HTML normalised as for plain text, with the HTML
>   comments stripped but other tags intact
>
>   notags.html - as above but all tags stripped

It may also contain a normalized 'javascript' file, which can be matched
with a type 7 signature.

>
>   rfc2397 - a directory that is usually empty. I don't think I have ever
>   seen a "data:" URL in real life.
>
> Text in nocomment.html and notags.html can be matched with a type 3
> signature.
>
>   SigName:3:*:hexsig
>
> A type 4 signature can be used to match text in the original mail
> file. This is not normalised so you have to match any line breaks and
> white space exactly. Less forgiving than 3 or 7, but you can match
> headers with this, or anchor text to line endings:
>
>  Local.zoosextour:4:*:0a0a687474703a2f2f{-50}2f7a6f6f736578746f75720a0a
>
> Occasionally you can use other types. Type 2 will match an Office
> document. The only time I have used this was to match an attached
> spreadsheet which contained an ad for a pills website.
>
> As for the hex sig itself, cut and paste text from the temp file into
> "sigtool --hex-dump" and paste the output onto the end of the sig. Put
> your sig into a file in the local directory with a name ending in
> ".ndb" - say local.ndb - then run clamscan again to see if it matches
>
>   $ cat local.ndb
>   Local.zoosextour:4:*:0a0a687474703a2f2f{-50}2f7a6f6f736578746f75720a0a
>   $ clamscan -d . filename
>   filename: Local.zoosextour.UNOFFICIAL FOUND
>
>   ----------- SCAN SUMMARY -----------
>   Known viruses: 248
>   Engine version: 0.94.2
>   Scanned directories: 0
>   Scanned files: 1
>   Infected files: 1
>   Data scanned: 0.00 MB
>   Time: 0.361 sec (0 m 0 s)
>
> I hope that helps

Nice summary of signature types.

Best regards,
--Edwin

_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://www.clamav.net/support/ml

Reply via email to