On 2009-04-26 18:21, Tom Shaw wrote:
> .ndb questions
>
> TargetType is confusing and very unclear.
>
> Type 2 What exactly is type 2. I first read this ad thought
> it was OLE executables but further reading indicates it might also
> include Excel, Word VB and other Microsoft files. True?
Type 2 is for files contained inside an OLE container. This includes
files inside Excel, Word, etc. since those are OLE2 containers too.
These files can be images, embedded executables, VBA scripts, ...
> Are they
> normalized?
>
VBA macros are decoded, everything else is simply extracted and scanned
with type 2 signatures.
> Type 3 What exactly is normaized HTML?
whitespace transformed to spaces, tags/tag attributes normalized, all
lowercased.
Run clamscan --leave-temps --tempdir=. yourfile.html, and look at the
files created.
> What happens to non
> ascii/Latin encodings? UTF?
HTML entities and char references are decoded, everything in ASCII range
(<0x80) is output as is,
the rest is transformed to &#xNNNN;
> Line terminators (\r,\r\n,\n)?
All consecutive whitespace is replaced with a single space.
> PHP,
>
No special treatment for PHP.
> Javascript
Javascript is normalized too:
- strings are normalized (hex encoding is decoded)
- numbers are parsed and normalized
- local variables/function names are normalized to 'n001' format
- argument to eval() is parsed as JS again
- unescape() is handled
- some simple JS packers are handled
- output is whitespace normalized
> and html escapes and html entities? Does this type get
> applied to Mail?
It gets applied to the HTML part of mail (if any).
> Does this type get applied to Mail when there are no
> HTML MIME sections? What other files is it applied to?
>
Applied to any file that matches the HTML filetype signature in daily.ftm.
> Type 4 signatures appear not to operate on any file that
> doesn't look like an 2821 document.
Type 4 is for mails, yes.
> Is this true? Are the internal
> encoding (such as QP or B64) decoded before applying signatures? In
> QP are =\r\n removed? For 8-bit mail what is done for the non-ascii
> encodings? Upper/lower case? Line terminators (\r,\r\n,\n)? Should
> UTF be considered?
Mail body is decoded (quoted-printable, etc.), but no normalization is done.
> If type 4 is for only 2821 mail format, is Type 7
> for all "text and script files including mail?
>
Type 7 is applied only if it isn't HTML filetype (type 3).
> Type 5 I assume are binary files such as jpg, png, tiff, swf mov, etc?
>
Yes.
> Type 7 What does normalized mean? What happens for characters
> above 127 or for UTF? Line terminators (\r,\r\n,\n)?
Whitespace is normalized, ASCII characters <127 are output, everything
else is stripped from the output.
Hence it works on UTF16/32 variants too.
> Does this type
> get applied to Mail as well? What other files is it applied to?
>
Applied to any files detected as text, who's size doesn't exceed a
threshold.
> Clamdocs specify clam having special processing for Office,
> RTF and PDF as well as HTML yet there are no "normalized" nor
> non-"normalized"types for these file formats.
>
Office files: VBA extracted => type 2
RTF: embedded files extracted => no special type needed, type 1 sigs can
be used for executables, etc.
Type 3 is normalized HTML.
> I assume that signatures of these types are applied to both
> uncompressed and compressed versions of the file.
>
You mean stored inside zip, rar, etc? Files are extracted from archives
first, then each archive member is scanned
with the appropriate signatures for its filetype.
> Wildcards
>
> Would be nice to have a wildcard that allowed a range of
> matching like regex *{6,8}
>
There are already range wildcards: {6-8}
> Would be nice to be able to have wildcards to match ascii
> numbers and ascci letters.
>
>
You can use (30|31|32|33|34|35|36|37|38|39) for numbers, and signatures
for letters can be constructed using multiple signatures in a logical
signature.
However if its any letter, why not just skip over it with ??. What is
the added benefit of matching [a-zA-Z]?
> Thanks for any and all clarifications and insights.
>
> For clamav/sourcefire folks, if any answer to above is no, could you
> consider adding in the future?
>
Please open a bugreport marked as enhancement if you want a feature that
is not implemented, and describe why that feature would be useful.
Best regards,
--Edwin
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://www.clamav.net/support/ml