https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6087
--- Comment #5 from Mark Martinec <[email protected]> 2009-03-20 12:03:00 PST --- > No mass-check, but I do have large logs, listing all rule hits of all > our processed mail, and my 25_yg.cf was in use, so I do have all the > necessary data for domains mentioned there [...] I'll prepare some stats I converted logs of the past 12 weeks to a mass-check format. As classification was automatic, I ditched everything with scores between 2 and 6.2, to reduce the likelyhood of false classification, then I pronounced low score mail as ham and high score spam. The hit-frequencies has the following to say on rules dealing with domains in question: OVERALL% SPAM% HAM% S/O RANK SCORE NAME 4058896 3465768 593128 0.854 0.00 0.00 (all messages) just rules checking on a DKIM/DK signature: 111049 110361 688 0.965 0.68 0.00 NOTVALID_YAHOO 363 363 0 1.000 0.61 0.00 NOTVALID_PAYPAL 274 274 0 1.000 0.60 0.00 NOTVALID_EBAY 41673 39778 1895 0.782 0.55 0.00 NOTVALID_GMAIL remaining related rules: 18519 18519 0 1.000 0.92 0.00 MSGID_YAHOO_CAPS 15476 15476 0 1.000 0.91 0.00 FORGED_MSGID_YAHOO 39942 39795 147 0.979 0.79 0.00 REPTO_QUOTE_YAHOO 61157 60927 230 0.978 0.76 0.00 FORGED_YAHOO_RCVD 881 879 2 0.987 0.69 0.00 SARE_EBAY_SPOOF_NAME 304 304 0 1.000 0.60 0.00 SARE_FORGED_PAYPAL_C 234 234 0 1.000 0.58 0.00 SARE_FORGED_PAYPAL 32 32 0 1.000 0.51 0.00 ZMIde_EBAYJOBSURI 21 21 0 1.000 0.50 0.00 SARE_FORGED_EBAY The NOTVALID_PAYPAL and NOTVALID_EBAY rules check all mail claiming to be from these domains, while NOTVALID_YAHOO and NOTVALID_GMAIL ignore mail which appears to be coming through a mailing list (see 25_yg.cf for details). So the NOTVALID_PAYPAL+NOTVALID_EBAY ditched 637 scams and did not hit on any ham. I double-checked the figure on the entire corpus (including mail with scores between 2 and 6.2), so I know I didn't miss any case due to auto-classification. Of course many of these scams would score high enough on other rules too, so checking on DKIM signature in case of eBay and PayPal is just an additional safety fuse. The NOTVALID_YAHOO and NOTVALID_GMAIL are another story, they hit on many ham messages too. Nevertheless, I find it valuable to assign 2.5 score points to each. According to my stats in Bug 5891, the average score from signed vs. unsigned mail from yahoo and gmail is very different. In other words, spammers claiming to be from yahoo or gmail rarely post their junk through the domain's server, while many or the regular users do: yahoo.com not signed, avg.score= 14.8 yahoo.com valid sign., avg.score= -0.7 gmail.com not signed, avg.score= 2.9 gmail.com valid sign., avg.score= -3.3 -- Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug.
