So here's a quick look at some DomainKeys rule freqs, from a quick
mass-check of the last ~10k ham and ~10k spam in my corpus (mass-check
--tail 9999 -j=8 --net --rules '^DK'):

OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME
  19991     9998     9993    0.500   0.00    0.00  (all messages)
100.000  50.0125  49.9875    0.500   0.00    0.00  (all messages as %)
  5.783   0.0500  11.5181    0.004   1.00    0.00  DK_SIGNED
  0.375   0.0100   0.7405    0.013   0.33   -0.10  DK_VERIFIED
  0.000   0.0000   0.0000    0.500   0.33    0.00  DK_POLICY_SIGNALL
  5.613   6.8714   4.3530    0.612   0.00    0.00  DK_POLICY_SIGNSOME
  4.972   6.3013   3.6425    0.634   0.00    0.00  DK_POLICY_TESTING


Some notes:

- DK_SIGNED means the message had a DK signature.  DK_VERIFIED
  means that it passed.  most of the failures are due to the
  various crud added to all messages in my corpus, such as:

  - SpamAssassin markup.  we have a bug open to move this to the start of
    the headers, instead of the end, which will fix this.  However we may
    have to hack a way to ignore those hdrs in the DK plugin, in existing
    corpora, otherwise mass-check figures will be really crappy (as
    above).

  - other crud added: 'Status', 'X-UID', 'X-Keywords' (all added by my
    IMAP server), and 'X-MH-Thread-Markup' (added by my mhthread
    script).

  Problem is, most DK records (and the recommended style of signature in
  the draft iirc), is to sign everything *below* the signature point, on
  the assumption that further transitions from the sender to the
  receiver will only every *prepend* headers to the existing set,
  and that the verification will take place inside the recipient's
  external-MX MTA.   My mail has already been through a variety of
  MTAs and both ends of an MDA.

  FWIW, GMail's DK record takes a more IIM-ish approach of signing a
  specific set of "important" headers like From, Subject, To et al., so
  virtually all of the DK_VERIFIED hits are from GMail.

- so far DK_SIGNED's a great ham sign on its own (not that I'm
  suggesting we should use that, of course).  the 4 spam mails look like
  they'd pass verification -- they're 419 spams sent by hand through
  yahoo and gmail's webmail interfaces.  (yes, they do these by hand.)

- obviously, a rule for "DK verification failed", ie. (DK_SIGNED &&
  !DK_VERIFIED) would make a lousy anti-spam rule -- it's hitting
  almost all ham here.   that may clear up a bit if we can figure
  out a way to deal with the "headers appended in passage" issue,
  but possibly not a whole lot, given the fact that DK sigs are
  broken by mailing lists appending footers to the body etc.

- in terms of rules, (DK_SIGNED && DK_VERIFIED &&
  DOMAIN_IN_SOME_WHITELIST_OR_ANOTHER) seems like the most likely
  aim.  but we'll need to figure out how to fix those header-manglings
  to get the hitrate anywhere useful.  (0.74% isn't really worth
  a DNS lookup.)

- the DK_POLICY ones are to get an idea of what people are publishing
  in their DK records.   looks like nobody's yet saying "we sign
  all outbound mail" ;)

- speeds of scans using just the DK rules, in spam:

   4693 0
   3722 1
    728 2
    472 3
    190 4
    110 5
     56 6
      9 7
      8 8
     10 9

and in ham:

   6382 0
   2338 1
    799 2
    349 3
    107 4
     11 5
      7 6

(generated with perl -pe 's/^.*scantime=//; s/,.*$//;' ham.log  | sort |uniq -c)
so it's reasonably fast.  (a single DNS lookup takes place on every message.)

--j.

Reply via email to