Hi,

Here are my stats after retraining 100's of messages. Both spam and ham:

{227} dspam_stats -H
antispam:
                 TP True Positives:                  4818
                 TN True Negatives:                 22115
                 FP False Positives:                    4
                 FN False Negatives:                    5
                 SC Spam Corpusfed:                     0
                 NC Nonspam Corpusfed:                  0
                 TL Training Left:                      0
                 SHR Spam Hit Rate                 99.90%
                 HSR Ham Strike Rate:               0.02%
                 PPV Positive predictive value:    99.92%
                 OCA Overall Accuracy:             99.97%

Last night it caught maybe 100 emails, but I had much more than that  
in my inbox.

Kind Regards,
Al




On Jul 23, 2015, at 12:13 PM, Nathanael D. Noblet wrote:

> On Wed, 2015-07-22 at 17:48 -0700, waterdog wrote:
>
>> The dspam_stats for this user don't look too good even after multiple
>> training attempts:
>>
>>                 TP True Positives:                     0
>>                 TN True Negatives:                    4
>>                 FP False Positives:                    2353
>>                 FN False Negatives:                  1947
>>                 SC Spam Corpusfed:                 0
>>                 NC Nonspam Corpusfed:           0
>>                 TL Training Left:                        143
>
> You can see from this line that it needs to receive another 143
> messages before it is out of training. It requires about 2500 messages
> before it flips a switch. I can't remember what switch but it flips
> one.
>
> When I setup myself years ago, I found a corpus of spam, and I fed it
> my entire mailbox + the spam. Now you can see my stats years later:
>
>               TP True Positives:                  3354
>               TN True Negatives:                239849
>               FP False Positives:                 1448
>               FN False Negatives:                  981
>               SC Spam Corpusfed:                     0
>               NC Nonspam Corpusfed:                  0
>               TL Training Left:                      0
>               SHR Spam Hit Rate                 77.37%
>               HSR Ham Strike Rate:               0.60%
>               PPV Positive predictive value:    69.85%
>               OCA Overall Accuracy:             99.01%
>
> You don't have enought data for dpsam do reliably do anything.
> Retraining one message as spam will *not* automatically get it to be
> classified as spam on the *next* classification.
>
> Watch the numbers in your stats which says whether training is
> occuring. If you have a false negative (ham as spam), train it and you
> should see the FN increment. If it does dspam is working as expected.
>
> The other implied part of your question is 'Why isn't dspam effective
> yet?'. Which is partly due to the amount of mail you've received so
> far, the type of spam, and the dspam settings. I used to setup people
> with TEFT as those were the recommendations and I think the default.
> Over the years I've seen it mentioned on this list multiple times that
> you should use TOE by default.
>
> I also use
>
> Algorithm graham burton
> Tokeninzer osb
>
> because of users of this list back in the day explaining that they  
> were
> better defaults.
>
>
>
> ---------------------------------------------------------------------- 
> --------
> _______________________________________________
> Dspam-user mailing list
> Dspam-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspam-user
>
> !DSPAM:55b11c8d189367246910663!
>


------------------------------------------------------------------------------
_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user

Reply via email to