Creating auto-generated rule sets.....

Matt Hampton Fri, 20 Jun 2008 00:34:14 -0700

Justin Mason wrote:

Well, it'd be worth cc'ing the dev list, if that's ok.   With any luck
there'll be future people trying similar stuff and it'll be handy to have
a thread URL to point at ;)

Quick intro - I have been working on automatically generatig rules fromthe Sane Security Clamav signatures. With a fair bit of help fromJustin I have something up and running so I wanted to share what I havedone so far to see what people think and for some feedback.

I have a small perl script that extracts the rules from the scam.ndb andphish.ndb files and generates 2 MAMOTH rulesets (60000 rules!).


I then run a mass check and then hit frequencies


Then the selection of rules to import is based on Justin's suggestion:

More or less -- I'd keep it even simpler.  Select if column 2 ("SPAM %
hit") > 0.5, and discard if column 3 ("HAM % hit") > 0.

The reason is, this is an automatically generated ruleset -- avoiding FPs
in auto-generated stuff is critical in my opinion.  Some of those are
pretty bad: an 8.8% false positive rate, ouch!!

The rule of thumb for false positives is that you will only see a fraction
of the "real-world" false positive rate in any measurement, since the
degree of variation between people's ham collections can be very large.

Finally I run a mkrules (that took a while to work out where all thefiles had to be - either that or I can't read documentation ;-))



And have a first stab at a ruleset avaliable:

http://www.coders.co.uk/80_sane.cf

I am concerned with the results of some of the rules e.g.

##{ SANE_f48d6d7bf39ebd0b4e830b808d5b45bd
body SANE_f48d6d7bf39ebd0b4e830b808d5b45bd /\.cn\//

describe SANE_f48d6d7bf39ebd0b4e830b808d5b45bdEmail.Malware.Sanesecurity.08022207u

score SANE_f48d6d7bf39ebd0b4e830b808d5b45bd 0.01
##} SANE_f48d6d7bf39ebd0b4e830b808d5b45bd

Sorry the rule names are long - I haven't truncated the hash yet!

It isn't automatically updating at the moment and all of the scores areset to 0.01


matt

Creating auto-generated rule sets.....

Reply via email to