I wrote up a couple scripts to calculate the ratio of the percentage of
rule pair hits between false-negatives (missed spam) and correct-negatives
(correct non-spam), inspired by Marc Perkel's thread that doesn't actually
have anything to do with bayes.

The thing I found most interesting was good ADVANCE_FEE rules that aren't
mutable, with a score of 1.  Why aren't these mutable?  Looks like they
would do us more good if they were included in re-scoring.

        *  1.0 ADVANCE_FEE_3_NEW Appears to be advance fee fraud (Nigerian 419)
        *  1.0 ADVANCE_FEE_4_NEW_MONEY Advance Fee fraud and lots of money
        *  1.0 ADVANCE_FEE_5_NEW_MONEY Advance Fee fraud and lots of money
        *  1.0 ADVANCE_FEE_3_NEW_MONEY Advance Fee fraud and lots of money
        *  1.0 ADVANCE_FEE_2_NEW_MONEY Advance Fee fraud and lots of money
        *  1.0 MONEY_FRAUD_5 Lots of money and many fraud phrases
        *  1.0 MONEY_FRAUD_3 Lots of money and several fraud phrases

Doing a not nice rule of (RCVD_IN_DNSWL_HI && SPF_FAIL) might be fun, or
putting !SPF_FAIL in the DNSWL rules.  Ick... *every* hit for that is in
the dos corpora, so probably not good to add.  (Daryl, what did you do?)

I used masscheck net corpora marked "Date: 20110924T124459Z",  
excluding zmi due his 98% hit rate on ALL_TRUSTED in his *spam*.  He says
he doesn't think this is a misconfiguration.  And I didn't filter for
recent emails, as score generation does (I should have).  In this run I
excluded __* rules.


Taking the first line as an example, ADVANCE_FEE_3_NEW together with
LOTS_OF_MONEY hit 6% of false negatives (missed spam), and 0% of correct
non-spam.  6 divided by 0.002 = 3584, the first column.

So you might say creating a rule to combine them would be good, except
that has already exactly been done, as ADVANCE_FEE_3_NEW_MONEY, which has a
score of 1, which I'm pretty sure should be increased.

3548.91519434629 ADVANCE_FEE_3_NEW LOTS_OF_MONEY (wrong 6.00706713780919% right 
0.00169264882614804%)
3548.91519434629 ADVANCE_FEE_3_NEW ADVANCE_FEE_3_NEW_MONEY (wrong 
6.00706713780919% right 0.00169264882614804%)
2713.87632508834 RCVD_IN_DNSWL_HI SPF_FAIL (wrong 4.59363957597173% right 
0.00169264882614804%)
2713.87632508834 DKIM_VALID_AU HTML_MIME_NO_HTML_TAG (wrong 4.59363957597173% 
right 0.00169264882614804%)
2087.59717314488 ADVANCE_FEE_2_NEW_MONEY HTML_MESSAGE (wrong 3.53356890459364% 
right 0.00169264882614804%)
1670.0777385159 DOS_RCVD_IP_TWICE_B RDNS_NONE (wrong 2.82685512367491% right 
0.00169264882614804%)
1670.0777385159 DOS_RCVD_IP_TWICE_B HTML_MESSAGE (wrong 2.82685512367491% right 
0.00169264882614804%)
1356.93816254417 HTML_MIME_NO_HTML_TAG RCVD_IN_DNSWL_HI (wrong 
4.59363957597173% right 0.00338529765229608%)
1356.93816254417 FREEMAIL_FROM MIME_HTML_ONLY (wrong 4.59363957597173% right 
0.00338529765229608%)
1356.93816254417 DKIM_VALID HTML_MIME_NO_HTML_TAG (wrong 4.59363957597173% 
right 0.00338529765229608%)
1356.93816254417 DKIM_SIGNED HTML_MIME_NO_HTML_TAG (wrong 4.59363957597173% 
right 0.00338529765229608%)
1252.55830388693 ADVANCE_FEE_4_NEW RP_MATCHES_RCVD (wrong 2.12014134275618% 
right 0.00169264882614804%)
1252.55830388693 ADVANCE_FEE_4_NEW RCVD_IN_DNSWL_NONE (wrong 2.12014134275618% 
right 0.00169264882614804%)
1182.97173144876 ADVANCE_FEE_4_NEW LOTS_OF_MONEY (wrong 6.00706713780919% right 
0.00507794647844412%)
1182.97173144876 ADVANCE_FEE_3_NEW_MONEY LOTS_OF_MONEY (wrong 6.00706713780919% 
right 0.00507794647844412%)
1043.79858657244 FREEMAIL_FROM HTML_FONT_SIZE_HUGE (wrong 1.76678445229682% 
right 0.00169264882614804%)
939.418727915194 HTML_FONT_LOW_CONTRAST RCVD_IN_DNSWL_MED (wrong 
3.18021201413428% right 0.00338529765229608%)
904.625441696113 FREEMAIL_FROM SPF_FAIL (wrong 4.59363957597173% right 
0.00507794647844412%)
835.038869257951 RCVD_IN_DNSWL_NONE TVD_SPACE_RATIO (wrong 1.41342756183746% 
right 0.00169264882614804%)
709.783038869258 ADVANCE_FEE_2_NEW_MONEY LOTS_OF_MONEY (wrong 6.00706713780919% 
right 0.00846324413074019%)
695.865724381625 ADVANCE_FEE_4_NEW HTML_MESSAGE (wrong 3.53356890459364% right 
0.00507794647844412%)
626.279151943463 FREEMAIL_REPLYTO RP_MATCHES_RCVD (wrong 1.06007067137809% 
right 0.00169264882614804%)
626.279151943463 FREEMAIL_REPLYTO RCVD_IN_DNSWL_NONE (wrong 1.06007067137809% 
right 0.00169264882614804%)
626.279151943463 ADVANCE_FEE_3_NEW_MONEY RP_MATCHES_RCVD (wrong 
2.12014134275618% right 0.00338529765229608%)
626.279151943463 ADVANCE_FEE_2_NEW_MONEY RCVD_IN_DNSWL_MED (wrong 
1.06007067137809% right 0.00169264882614804%)
584.527208480565 HTML_MIME_NO_HTML_TAG MIME_HTML_ONLY (wrong 4.9469964664311% 
right 0.00846324413074019%)
584.527208480565 HTML_MESSAGE HTML_MIME_NO_HTML_TAG (wrong 4.9469964664311% 
right 0.00846324413074019%)
521.899293286219 HTML_MESSAGE RCVD_IN_XBL (wrong 1.76678445229682% right 
0.00338529765229608%)
417.519434628975 TVD_SPACE_RATIO UNPARSEABLE_RELAY (wrong 1.41342756183746% 
right 0.00338529765229608%)
417.519434628975 TVD_RCVD_SPACE_BRACKET TVD_SPACE_RATIO (wrong 
1.41342756183746% right 0.00338529765229608%)
417.519434628975 MISSING_MID TO_NO_BRKTS_HTML_ONLY (wrong 0.706713780918728% 
right 0.00169264882614804%)
417.519434628975 MIME_HTML_ONLY TO_NO_BRKTS_HTML_ONLY (wrong 0.706713780918728% 
right 0.00169264882614804%)
417.519434628975 MIME_BASE64_BLANKS SPF_HELO_PASS (wrong 1.41342756183746% 
right 0.00338529765229608%)
417.519434628975 HTML_MESSAGE UPPERCASE_50_75 (wrong 0.706713780918728% right 
0.00169264882614804%)
417.519434628975 HTML_MESSAGE TO_NO_BRKTS_HTML_ONLY (wrong 0.706713780918728% 
right 0.00169264882614804%)
417.519434628975 HTML_COMMENT_SAVED_URL RP_MATCHES_RCVD (wrong 
1.41342756183746% right 0.00338529765229608%)
417.519434628975 HTML_COMMENT_SAVED_URL HTML_MESSAGE (wrong 1.41342756183746% 
right 0.00338529765229608%)
417.519434628975 FSL_UA FSL_XM_419 (wrong 0.706713780918728% right 
0.00169264882614804%)
417.519434628975 FREEMAIL_REPLYTO HTML_MESSAGE (wrong 0.706713780918728% right 
0.00169264882614804%)
417.519434628975 DKIM_VALID FREEMAIL_REPLYTO (wrong 0.706713780918728% right 
0.00169264882614804%)
417.519434628975 DKIM_SIGNED FREEMAIL_REPLYTO (wrong 0.706713780918728% right 
0.00169264882614804%)
382.726148409894 FREEMAIL_FROM SPF_SOFTFAIL (wrong 3.886925795053% right 
0.0101558929568882%)
365.329505300353 HTML_MESSAGE MIME_BASE64_BLANKS (wrong 2.47349823321555% right 
0.00677059530459216%)
347.932862190813 ADVANCE_FEE_4_NEW DKIM_SIGNED (wrong 1.76678445229682% right 
0.00507794647844412%)
313.139575971731 MIME_HTML_ONLY MISSING_MID (wrong 1.06007067137809% right 
0.00338529765229608%)
313.139575971731 ADVANCE_FEE_2_NEW_MONEY RCVD_IN_DNSWL_NONE (wrong 
2.12014134275618% right 0.00677059530459216%)
303.650497911982 HTML_MESSAGE SPF_FAIL (wrong 5.65371024734982% right 
0.0186191370876284%)
284.672341792483 DKIM_VALID SPF_FAIL (wrong 5.30035335689046% right 
0.0186191370876284%)
284.672341792483 DKIM_VALID_AU SPF_FAIL (wrong 5.30035335689046% right 
0.0186191370876284%)
278.34628975265 HTML_MESSAGE TO_NO_BRKTS_PCNT (wrong 1.41342756183746% right 
0.00507794647844412%)
250.511660777385 ADVANCE_FEE_2_NEW_MONEY RP_MATCHES_RCVD (wrong 
2.12014134275618% right 0.00846324413074019%)
208.759717314488 SPF_HELO_PASS SUBJ_ILLEGAL_CHARS (wrong 0.706713780918728% 
right 0.00338529765229608%)
208.759717314488 RP_MATCHES_RCVD USER_IN_DEF_WHITELIST (wrong 
0.353356890459364% right 0.00169264882614804%)
208.759717314488 RP_MATCHES_RCVD SPF_SOFTFAIL (wrong 0.353356890459364% right 
0.00169264882614804%)
208.759717314488 RDNS_NONE SUBJ_YOUR_DEBT (wrong 0.353356890459364% right 
0.00169264882614804%)
208.759717314488 RDNS_NONE SPF_SOFTFAIL (wrong 0.353356890459364% right 
0.00169264882614804%)
208.759717314488 RCVD_IN_DNSWL_NONE USER_IN_DEF_WHITELIST (wrong 
0.353356890459364% right 0.00169264882614804%)
208.759717314488 RCVD_IN_DNSWL_MED WEIRD_QUOTING (wrong 0.353356890459364% 
right 0.00169264882614804%)
208.759717314488 RCVD_IN_DNSWL_HI TO_NO_BRKTS_HTML_ONLY (wrong 
0.353356890459364% right 0.00169264882614804%)
208.759717314488 NML_ADSP_CUSTOM_MED SPF_PASS (wrong 0.353356890459364% right 
0.00169264882614804%)
208.759717314488 MISSING_MID USER_IN_DEF_WHITELIST (wrong 0.353356890459364% 
right 0.00169264882614804%)
208.759717314488 MISSING_MID RCVD_IN_DNSWL_HI (wrong 0.353356890459364% right 
0.00169264882614804%)
208.759717314488 MISSING_HEADERS SPF_HELO_PASS (wrong 0.353356890459364% right 
0.00169264882614804%)
208.759717314488 MISSING_DATE SPF_PASS (wrong 0.353356890459364% right 
0.00169264882614804%)
208.759717314488 MISSING_DATE RP_MATCHES_RCVD (wrong 0.353356890459364% right 
0.00169264882614804%)
208.759717314488 MISSING_DATE RCVD_IN_RP_SAFE (wrong 0.353356890459364% right 
0.00169264882614804%)
208.759717314488 MISSING_DATE RCVD_IN_DNSWL_LOW (wrong 0.353356890459364% right 
0.00169264882614804%)
208.759717314488 MIME_QP_LONG_LINE SPF_NEUTRAL (wrong 0.353356890459364% right 
0.00169264882614804%)
208.759717314488 MIME_QP_LONG_LINE MISSING_DATE (wrong 0.353356890459364% right 
0.00169264882614804%)
208.759717314488 MIME_HTML_ONLY USER_IN_DEF_WHITELIST (wrong 0.353356890459364% 
right 0.00169264882614804%)
208.759717314488 MIME_HTML_ONLY PYZOR_CHECK (wrong 0.353356890459364% right 
0.00169264882614804%)
208.759717314488 MIME_HTML_ONLY MISSING_DATE (wrong 0.353356890459364% right 
0.00169264882614804%)
208.759717314488 MIME_BASE64_TEXT USER_IN_DEF_WHITELIST (wrong 
0.353356890459364% right 0.00169264882614804%)
208.759717314488 MIME_BASE64_TEXT RCVD_IN_DNSWL_NONE (wrong 0.353356890459364% 
right 0.00169264882614804%)
208.759717314488 MIME_BASE64_TEXT MISSING_MID (wrong 0.353356890459364% right 
0.00169264882614804%)
208.759717314488 LOTS_OF_MONEY RCVD_IN_XBL (wrong 0.353356890459364% right 
0.00169264882614804%)
208.759717314488 LOTS_OF_MONEY MISSING_MID (wrong 0.353356890459364% right 
0.00169264882614804%)
208.759717314488 INVALID_MSGID RDNS_NONE (wrong 0.353356890459364% right 
0.00169264882614804%)
208.759717314488 INVALID_MSGID RCVD_IN_RP_SAFE (wrong 0.353356890459364% right 
0.00169264882614804%)
208.759717314488 INVALID_MSGID RCVD_IN_RP_CERTIFIED (wrong 0.353356890459364% 
right 0.00169264882614804%)
208.759717314488 INVALID_DATE RP_MATCHES_RCVD (wrong 0.353356890459364% right 
0.00169264882614804%)
208.759717314488 HTML_OBFUSCATE_05_10 RCVD_IN_DNSWL_NONE (wrong 
0.353356890459364% right 0.00169264882614804%)
208.759717314488 HTML_MESSAGE USER_IN_DEF_WHITELIST (wrong 0.353356890459364% 
right 0.00169264882614804%)
208.759717314488 HTML_MESSAGE URIBL_SBL (wrong 0.353356890459364% right 
0.00169264882614804%)
208.759717314488 HTML_MESSAGE MISSING_DATE (wrong 0.353356890459364% right 
0.00169264882614804%)
208.759717314488 HTML_IMAGE_ONLY_20 RCVD_IN_DNSWL_NONE (wrong 
0.353356890459364% right 0.00169264882614804%)
208.759717314488 HTML_IMAGE_ONLY_16 RCVD_IN_DNSWL_HI (wrong 0.353356890459364% 
right 0.00169264882614804%)
208.759717314488 HTML_FONT_SIZE_HUGE LOTS_OF_MONEY (wrong 0.706713780918728% 
right 0.00338529765229608%)
208.759717314488 HTML_FONT_LOW_CONTRAST SPF_PASS (wrong 1.41342756183746% right 
0.00677059530459216%)
208.759717314488 HTML_FONT_LOW_CONTRAST SPF_HELO_PASS (wrong 0.706713780918728% 
right 0.00338529765229608%)
208.759717314488 HK_NAME_FM_DR RP_MATCHES_RCVD (wrong 0.353356890459364% right 
0.00169264882614804%)
208.759717314488 HK_NAME_FM_DR HTML_MESSAGE (wrong 0.353356890459364% right 
0.00169264882614804%)
208.759717314488 FSL_XM_419 RDNS_NONE (wrong 0.353356890459364% right 
0.00169264882614804%)
208.759717314488 FSL_XM_419 NSL_RCVD_FROM_USER (wrong 0.353356890459364% right 
0.00169264882614804%)
208.759717314488 FSL_UA RDNS_NONE (wrong 0.353356890459364% right 
0.00169264882614804%)
208.759717314488 FSL_UA NSL_RCVD_FROM_USER (wrong 0.353356890459364% right 
0.00169264882614804%)
208.759717314488 FSL_CTYPE_WIN1251 NSL_RCVD_FROM_USER (wrong 0.353356890459364% 
right 0.00169264882614804%)
208.759717314488 FSL_CTYPE_WIN1251 FSL_XM_419 (wrong 0.353356890459364% right 
0.00169264882614804%)
208.759717314488 FSL_CTYPE_WIN1251 FSL_UA (wrong 0.353356890459364% right 
0.00169264882614804%)
208.759717314488 FROM_MISSP_DKIM RDNS_NONE (wrong 0.353356890459364% right 
0.00169264882614804%)
208.759717314488 FROM_EXCESS_BASE64 USER_IN_DEF_WHITELIST (wrong 
0.353356890459364% right 0.00169264882614804%)


-- 
"Wash daily from nose-tip to tail-tip; drink deeply, but never too deep;
And remember the night is for hunting, and forget not the day is for sleep."
- The Law of the Jungle, Rudyard Kipling
http://www.ChaosReigns.com

Reply via email to