>-----Original Message-----
>From: Jeff Chan [mailto:[EMAIL PROTECTED]
>Sent: Wednesday, May 11, 2005 3:59 AM
>To: Spamassassin Devel List
>Subject: Re: ws.surbl.org scores before and after Chris Santerre data
>gone?
>
>
>On Tuesday, May 10, 2005, 1:34:27 PM, Theo Dinter wrote:
>>> > On Mon, May 02, 2005 at 03:16:11PM -0700, Jeff Chan wrote:
>>> >> If so can you provide a before and after ham/spam summary as of
>>> >> say a week ago and now-ish?
>>> 
>>> > For SpamAssassin, our last weekly run (does net checks) 
>failed due to a code
>>> > issue.  Oops.  In theory, the next weekly run (on 
>Saturdays) should occur and
>>> > we can compare it to the results from 2 weeks ago.
>
>> Ok, the two run sizes are a bit different, but you can get 
>general stats from
>> this I think:
>
>> Previous run (4/23):
>
>> OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME
>>  182168   109007    73161    0.598   0.00    0.00  (all messages)
>>  12.750  21.3041   0.0055    1.000   0.99    0.00  URIBL_SC_SURBL
>>  36.463  60.9135   0.0328    0.999   0.98    0.00  URIBL_JP_SURBL
>>   9.809  16.3843   0.0109    0.999   0.97    0.00  URIBL_AB_SURBL
>>  36.982  61.7355   0.1011    0.998   0.89    0.00  URIBL_WS_SURBL
>>  38.506  64.2683   0.1203    0.998   0.87    0.00  URIBL_OB_SURBL
>>   0.211   0.3532   0.0000    1.000   0.66    0.00  URIBL_PH_SURBL
>
>> Latest run (5/8):
>
>> OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME
>>  339239   240537    98702    0.709   0.00    0.00  (all messages)
>> 100.000  70.9049  29.0951    0.709   0.00    0.00  (all 
>messages as %)
>>  13.111  18.4895   0.0020    1.000   0.98    0.00  URIBL_SC_SURBL
>>  37.333  52.6451   0.0172    1.000   0.98    0.00  URIBL_JP_SURBL
>>   8.836  12.4600   0.0041    1.000   0.97    0.00  URIBL_AB_SURBL
>>  38.140  53.7672   0.0567    0.999   0.91    0.00  URIBL_OB_SURBL
>>  40.770  57.4652   0.0841    0.999   0.87    0.00  URIBL_WS_SURBL
>>   0.215   0.3035   0.0000    1.000   0.61    0.00  URIBL_PH_SURBL
>
>Thanks.  The differing corpora sizes makes it difficult to
>compare however.  For example the 5/8 spam count is more than
>double, but the ham count is like 35% more.  Therefore the
>percentages are not directly comparable.
>
>Assuming the percentages in the SPAM and HAM columns represent
>percentages of hits within those columns, then here are the
>HAM percentages multiplied by the ham count at the top of the
>column for the number of ham hits (counts) per list:
>
>4/23
>
>NAME              ham hits?
>URIBL_SC_SURBL           4
>URIBL_JP_SURBL          24
>URIBL_AB_SURBL           8
>URIBL_WS_SURBL          74
>URIBL_OB_SURBL          88
>URIBL_PH_SURBL           0
>
>
>5/8
>
>NAME
>URIBL_SC_SURBL           2
>URIBL_JP_SURBL          17
>URIBL_AB_SURBL           4
>URIBL_WS_SURBL          83
>URIBL_OB_SURBL          56
>URIBL_PH_SURBL           0
>
>(If my assumption is wrong, please let me know how to correct
>it.)
>
>On a 35% larger ham corpus, WS hit 84 hams versus 74 before.
>In a sense that's a step in the wrong direction, but the
>differing ham corpora make conclusions difficult.

Remove my submissions and FPs go up? :) 

Are the hams confirmed? We ran a check on black.uribl.com and found a poor
FP rate. Come to find out, they were all spams the didn't score high enough.
That turned the numbers right where they should have been. 

Jeff, maybe you should temp disable other peoples submissions, rerun the
test, and see where these FPs are coming from? Its the only way I can think
to find them. 

--Chris

Reply via email to