https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5850
Justin Mason <[email protected]> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED --- Comment #3 from Justin Mason <[email protected]> 2009-07-16 13:51:28 PST --- ok, I've done this. svn commit -m "bug 5850: integrate corpus quality report into the rule-QA ap es/rule-qa/automc/ruleqa.cgi Sending masses/rule-qa/automc/ruleqa.cgi Transmitting file data . Committed revision 794847 ( https://svn.apache.org/viewcvs.cgi?view=rev&rev=794847 ). If you look at a rule's detail page, e.g. http://ruleqa.spamassassin.org/20090716-r794596-n/T_CN_URL/detail , there's now a "corpus" link beside each contributor's name in the "set 0, broken down by contributor" table. Click on that, and you'll be brought to the (new) corpus quality report part of the detail page, which lists the contributors and attributes of their corpora. For example, here's today's: bb-jhardin Spam messages Score range Ham messages Score range in 2009-02 3 (0%) [2,23] 0 in 2009-03 4 (0%) [4,10] 0 in 2009-04 7 (0%) [4,22] 0 in 2009-05 12 (0%) [2,21] 0 in 2009-06 39 (0%) [0,25] 0 in 2009-07 8 (0%) [1,21] 2 (0%) [2,4] TOTAL: 73 (0%) [0,25] 2 (0%) [2,4] >From this you can see that John's corpus is pretty recent and pretty small, with basically no ham and only a little spam. Sort it out John ;) bb-jm Spam messages Score range Ham messages Score range in 2009-01 0 265 (0%) [0,5] in 2009-02 0 376 (0%) [0,4] in 2009-03 0 218 (0%) [-12,4] in 2009-04 0 2 (0%) [0,2] in 2009-05 0 1 (0%) [0,0] in 2009-06 73845 (7%) [0,54] 0 in 2009-07 26054 (2%) [0,53] 0 TOTAL: 99899 (10%) [0,54] 862 (1%) [-12,5] You can see that my "bb-jm" corpus, the mail I've uploaded for mass-checking, makes up 10% of the total spam corpus, and 1% of the total ham corpus. I haven't uploaded any ham recently, and the spam is very recent. dos Spam messages Score range Ham messages Score range in 2007 0 5692 (9%) [-1,10] in 2008 0 10058 (17%) [-1,11] in 2008-07 0 442 (0%) [-1,6] in 2008-08 0 1062 (1%) [-1,8] in 2008-09 0 829 (1%) [-1,9] in 2008-10 0 1051 (1%) [-1,12] in 2008-11 0 1256 (2%) [-1,9] in 2008-12 0 1384 (2%) [-1,8] in 2009-01 0 1752 (3%) [-1,5] in 2009-02 0 1171 (2%) [-1,8] in 2009-03 0 1422 (2%) [-1,5] in 2009-04 0 1214 (2%) [-1,9] in 2009-05 244774 (24%) [0,37] 1278 (2%) [-1,7] in 2009-06 505310 (50%) [0,37] 1148 (1%) [-1,6] in 2009-07 118872 (11%) [0,38] 436 (0%) [-1,4] TOTAL: 868956 (87%) [0,38] 30195 (52%) [-1,12] You can see that Daryl's got ham going back to 2007. jm Spam messages Score range Ham messages Score range in 2008-10 0 1859 (3%) [-14,6] in 2008-11 6549 (0%) [-12,23] 6339 (10%) [-14,7] in 2008-12 2702 (0%) [-12,21] 4446 (7%) [-14,10] in 2009-01 2740 (0%) [0,22] 5732 (9%) [-14,11] in 2009-02 1235 (0%) [0,20] 4651 (8%) [-1,10] in 2009-03 2017 (0%) [0,16] 1914 (3%) [-1,6] in 2009-04 4735 (0%) [0,22] 22 (0%) [0,6] in 2009-05 2079 (0%) [0,24] 0 in 2009-06 2451 (0%) [0,16] 0 in 2009-07 429 (0%) [0,13] 2 (0%) [2,4] TOTAL: 24937 (2%) [-12,24] 24965 (43%) [-14,11] wtogami Spam messages Score range Ham messages Score range in 2005 0 73 (0%) [0,5] in 2006 0 75 (0%) [0,5] in 2007 0 123 (0%) [0,9] in 2008 0 77 (0%) [0,10] in 2008-07 0 5 (0%) [0,3] in 2008-08 0 13 (0%) [0,11] in 2008-09 0 8 (0%) [0,7] in 2008-10 0 13 (0%) [0,9] in 2008-11 0 38 (0%) [0,9] in 2008-12 0 28 (0%) [0,9] in 2009-01 4 (0%) [0,0] 38 (0%) [-1,10] in 2009-02 41 (0%) [0,5] 20 (0%) [0,10] in 2009-03 76 (0%) [0,9] 50 (0%) [0,11] in 2009-04 94 (0%) [0,16] 22 (0%) [0,10] in 2009-05 418 (0%) [0,16] 614 (1%) [-1,10] in 2009-06 504 (0%) [0,10] 446 (0%) [-1,12] in 2009-07 657 (0%) [0,27] 319 (0%) [-1,10] TOTAL: 1794 (0%) [0,27] 1962 (3%) [-1,12] I think this will be pretty handy... -- Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug.
