https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6087





--- Comment #5 from Mark Martinec <[email protected]>  2009-03-20 12:03:00 
PST ---
> No mass-check, but I do have large logs, listing all rule hits of all
> our processed mail, and my 25_yg.cf was in use, so I do have all the
> necessary data for domains mentioned there [...] I'll prepare some stats

I converted logs of the past 12 weeks to a mass-check format. As classification
was automatic, I ditched everything with scores between 2 and 6.2, to reduce
the likelyhood of false classification, then I pronounced low score mail as ham
and high score spam. The hit-frequencies has the following to say on rules
dealing with domains in question:

OVERALL%   SPAM%     HAM%     S/O    RANK  SCORE  NAME
4058896  3465768   593128    0.854   0.00   0.00  (all messages)

just rules checking on a DKIM/DK signature:
 111049   110361      688    0.965   0.68   0.00  NOTVALID_YAHOO
    363      363        0    1.000   0.61   0.00  NOTVALID_PAYPAL
    274      274        0    1.000   0.60   0.00  NOTVALID_EBAY
  41673    39778     1895    0.782   0.55   0.00  NOTVALID_GMAIL

remaining related rules:
  18519    18519        0    1.000   0.92   0.00  MSGID_YAHOO_CAPS
  15476    15476        0    1.000   0.91   0.00  FORGED_MSGID_YAHOO
  39942    39795      147    0.979   0.79   0.00  REPTO_QUOTE_YAHOO
  61157    60927      230    0.978   0.76   0.00  FORGED_YAHOO_RCVD
    881      879        2    0.987   0.69   0.00  SARE_EBAY_SPOOF_NAME
    304      304        0    1.000   0.60   0.00  SARE_FORGED_PAYPAL_C
    234      234        0    1.000   0.58   0.00  SARE_FORGED_PAYPAL
     32       32        0    1.000   0.51   0.00  ZMIde_EBAYJOBSURI
     21       21        0    1.000   0.50   0.00  SARE_FORGED_EBAY

The NOTVALID_PAYPAL and NOTVALID_EBAY rules check all mail claiming to be
from these domains, while NOTVALID_YAHOO and NOTVALID_GMAIL ignore mail
which appears to be coming through a mailing list (see 25_yg.cf for details).

So the NOTVALID_PAYPAL+NOTVALID_EBAY ditched 637 scams and did not hit on
any ham. I double-checked the figure on the entire corpus (including mail
with scores between 2 and 6.2), so I know I didn't miss any case due to
auto-classification. Of course many of these scams would score high enough
on other rules too, so checking on DKIM signature in case of eBay and PayPal
is just an additional safety fuse.

The NOTVALID_YAHOO and NOTVALID_GMAIL are another story, they hit on many ham
messages too. Nevertheless, I find it valuable to assign 2.5 score points
to each.

According to my stats in Bug 5891, the average score from signed vs. unsigned
mail from yahoo and gmail is very different. In other words, spammers claiming
to be from yahoo or gmail rarely post their junk through the domain's server,
while many or the regular users do:

  yahoo.com  not signed,  avg.score= 14.8
  yahoo.com  valid sign., avg.score= -0.7

  gmail.com  not signed,  avg.score=  2.9
  gmail.com  valid sign., avg.score= -3.3


-- 
Configure bugmail: 
https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

Reply via email to