So here's a quick look at some DomainKeys rule freqs, from a quick
mass-check of the last ~10k ham and ~10k spam in my corpus (mass-check
--tail 9999 -j=8 --net --rules '^DK'):
OVERALL% SPAM% HAM% S/O RANK SCORE NAME
19991 9998 9993 0.500 0.00 0.00 (all messages)
100.000 50.0125 49.9875 0.500 0.00 0.00 (all messages as %)
5.783 0.0500 11.5181 0.004 1.00 0.00 DK_SIGNED
0.375 0.0100 0.7405 0.013 0.33 -0.10 DK_VERIFIED
0.000 0.0000 0.0000 0.500 0.33 0.00 DK_POLICY_SIGNALL
5.613 6.8714 4.3530 0.612 0.00 0.00 DK_POLICY_SIGNSOME
4.972 6.3013 3.6425 0.634 0.00 0.00 DK_POLICY_TESTING
Some notes:
- DK_SIGNED means the message had a DK signature. DK_VERIFIED
means that it passed. most of the failures are due to the
various crud added to all messages in my corpus, such as:
- SpamAssassin markup. we have a bug open to move this to the start of
the headers, instead of the end, which will fix this. However we may
have to hack a way to ignore those hdrs in the DK plugin, in existing
corpora, otherwise mass-check figures will be really crappy (as
above).
- other crud added: 'Status', 'X-UID', 'X-Keywords' (all added by my
IMAP server), and 'X-MH-Thread-Markup' (added by my mhthread
script).
Problem is, most DK records (and the recommended style of signature in
the draft iirc), is to sign everything *below* the signature point, on
the assumption that further transitions from the sender to the
receiver will only every *prepend* headers to the existing set,
and that the verification will take place inside the recipient's
external-MX MTA. My mail has already been through a variety of
MTAs and both ends of an MDA.
FWIW, GMail's DK record takes a more IIM-ish approach of signing a
specific set of "important" headers like From, Subject, To et al., so
virtually all of the DK_VERIFIED hits are from GMail.
- so far DK_SIGNED's a great ham sign on its own (not that I'm
suggesting we should use that, of course). the 4 spam mails look like
they'd pass verification -- they're 419 spams sent by hand through
yahoo and gmail's webmail interfaces. (yes, they do these by hand.)
- obviously, a rule for "DK verification failed", ie. (DK_SIGNED &&
!DK_VERIFIED) would make a lousy anti-spam rule -- it's hitting
almost all ham here. that may clear up a bit if we can figure
out a way to deal with the "headers appended in passage" issue,
but possibly not a whole lot, given the fact that DK sigs are
broken by mailing lists appending footers to the body etc.
- in terms of rules, (DK_SIGNED && DK_VERIFIED &&
DOMAIN_IN_SOME_WHITELIST_OR_ANOTHER) seems like the most likely
aim. but we'll need to figure out how to fix those header-manglings
to get the hitrate anywhere useful. (0.74% isn't really worth
a DNS lookup.)
- the DK_POLICY ones are to get an idea of what people are publishing
in their DK records. looks like nobody's yet saying "we sign
all outbound mail" ;)
- speeds of scans using just the DK rules, in spam:
4693 0
3722 1
728 2
472 3
190 4
110 5
56 6
9 7
8 8
10 9
and in ham:
6382 0
2338 1
799 2
349 3
107 4
11 5
7 6
(generated with perl -pe 's/^.*scantime=//; s/,.*$//;' ham.log | sort |uniq -c)
so it's reasonably fast. (a single DNS lookup takes place on every message.)
--j.