https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5850


Justin Mason <[email protected]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED




--- Comment #3 from Justin Mason <[email protected]>  2009-07-16 13:51:28 PST ---
ok, I've done this.

svn commit -m "bug 5850: integrate corpus quality report into the rule-QA ap
es/rule-qa/automc/ruleqa.cgi
Sending        masses/rule-qa/automc/ruleqa.cgi
Transmitting file data .
Committed revision 794847 ( 
https://svn.apache.org/viewcvs.cgi?view=rev&rev=794847 ).


If you look at a rule's detail page, e.g.
http://ruleqa.spamassassin.org/20090716-r794596-n/T_CN_URL/detail , there's now
a "corpus" link beside each contributor's name in the "set 0, broken down by
contributor" table.  Click on that, and you'll be brought to the (new) corpus
quality report part of the detail page, which lists the contributors and
attributes of their corpora.  For example, here's today's:

bb-jhardin       Spam messages    Score range    Ham messages     Score range   
  in 2009-02           3   (0%)   [2,23]               0                        
  in 2009-03           4   (0%)   [4,10]               0                        
  in 2009-04           7   (0%)   [4,22]               0                        
  in 2009-05          12   (0%)   [2,21]               0                        
  in 2009-06          39   (0%)   [0,25]               0                        
  in 2009-07           8   (0%)   [1,21]               2   (0%)   [2,4]         
  TOTAL:              73   (0%)   [0,25]               2   (0%)   [2,4]         

>From this you can see that John's corpus is pretty recent and pretty small, 
with basically no ham and only a little spam.  Sort it out John ;)


bb-jm            Spam messages    Score range    Ham messages     Score range   
  in 2009-01           0                             265   (0%)   [0,5]         
  in 2009-02           0                             376   (0%)   [0,4]         
  in 2009-03           0                             218   (0%)   [-12,4]       
  in 2009-04           0                               2   (0%)   [0,2]         
  in 2009-05           0                               1   (0%)   [0,0]         
  in 2009-06       73845   (7%)   [0,54]               0                        
  in 2009-07       26054   (2%)   [0,53]               0                        
  TOTAL:           99899  (10%)   [0,54]             862   (1%)   [-12,5]       

You can see that my "bb-jm" corpus, the mail I've uploaded for mass-checking,
makes up 10% of the total spam corpus, and 1% of the total ham corpus.  I
haven't uploaded any ham recently, and the spam is very recent.


dos              Spam messages    Score range    Ham messages     Score range   
  in 2007              0                            5692   (9%)   [-1,10]       
  in 2008              0                           10058  (17%)   [-1,11]       
  in 2008-07           0                             442   (0%)   [-1,6]        
  in 2008-08           0                            1062   (1%)   [-1,8]        
  in 2008-09           0                             829   (1%)   [-1,9]        
  in 2008-10           0                            1051   (1%)   [-1,12]       
  in 2008-11           0                            1256   (2%)   [-1,9]        
  in 2008-12           0                            1384   (2%)   [-1,8]        
  in 2009-01           0                            1752   (3%)   [-1,5]        
  in 2009-02           0                            1171   (2%)   [-1,8]        
  in 2009-03           0                            1422   (2%)   [-1,5]        
  in 2009-04           0                            1214   (2%)   [-1,9]        
  in 2009-05      244774  (24%)   [0,37]            1278   (2%)   [-1,7]        
  in 2009-06      505310  (50%)   [0,37]            1148   (1%)   [-1,6]        
  in 2009-07      118872  (11%)   [0,38]             436   (0%)   [-1,4]        
  TOTAL:          868956  (87%)   [0,38]           30195  (52%)   [-1,12]       

You can see that Daryl's got ham going back to 2007.


jm               Spam messages    Score range    Ham messages     Score range   
  in 2008-10           0                            1859   (3%)   [-14,6]       
  in 2008-11        6549   (0%)   [-12,23]          6339  (10%)   [-14,7]       
  in 2008-12        2702   (0%)   [-12,21]          4446   (7%)   [-14,10]      
  in 2009-01        2740   (0%)   [0,22]            5732   (9%)   [-14,11]      
  in 2009-02        1235   (0%)   [0,20]            4651   (8%)   [-1,10]       
  in 2009-03        2017   (0%)   [0,16]            1914   (3%)   [-1,6]        
  in 2009-04        4735   (0%)   [0,22]              22   (0%)   [0,6]         
  in 2009-05        2079   (0%)   [0,24]               0                        
  in 2009-06        2451   (0%)   [0,16]               0                        
  in 2009-07         429   (0%)   [0,13]               2   (0%)   [2,4]         
  TOTAL:           24937   (2%)   [-12,24]         24965  (43%)   [-14,11]      

wtogami          Spam messages    Score range    Ham messages     Score range   
  in 2005              0                              73   (0%)   [0,5]         
  in 2006              0                              75   (0%)   [0,5]         
  in 2007              0                             123   (0%)   [0,9]         
  in 2008              0                              77   (0%)   [0,10]        
  in 2008-07           0                               5   (0%)   [0,3]         
  in 2008-08           0                              13   (0%)   [0,11]        
  in 2008-09           0                               8   (0%)   [0,7]         
  in 2008-10           0                              13   (0%)   [0,9]         
  in 2008-11           0                              38   (0%)   [0,9]         
  in 2008-12           0                              28   (0%)   [0,9]         
  in 2009-01           4   (0%)   [0,0]               38   (0%)   [-1,10]       
  in 2009-02          41   (0%)   [0,5]               20   (0%)   [0,10]        
  in 2009-03          76   (0%)   [0,9]               50   (0%)   [0,11]        
  in 2009-04          94   (0%)   [0,16]              22   (0%)   [0,10]        
  in 2009-05         418   (0%)   [0,16]             614   (1%)   [-1,10]       
  in 2009-06         504   (0%)   [0,10]             446   (0%)   [-1,12]       
  in 2009-07         657   (0%)   [0,27]             319   (0%)   [-1,10]       
  TOTAL:            1794   (0%)   [0,27]            1962   (3%)   [-1,12]       



I think this will be pretty handy...

-- 
Configure bugmail: 
https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

Reply via email to