> Hi Patrik,
>
> I just pulled that message up. One thing that I notice is that the
> number of corpusfed spam messages is 100X the number of non-spam
> messages. That is a recipe for poor performance. The recommendation
> is 1-2k messages of each type. Try adding 1500 spam messages to your
> training corpus. My mental picture for this poor tagging is that
> each corpus spam and not-spam is two orthogonal comb filters. If
> you only use 1 dimension you get much poored discrimination than
> if you use both dimensions. In other words, it is key to know not
> only what is bad mail but also what is good mail. The training
> mode should be selected to keep the on-going training balanced as
> well. I hope that this helps.
>
> Regards,
> Ken

First of all, thanks for all replies! Great response from the mailing  
list.

Bear in mind that I don't have any experience with dspam other than  
installing it (which was done a 1-2 years ago I guess).
I was feeding it from the beginning with a lot of old spam but not  
many non-spam as you noticed. Didn't realize that would have an  
impact. But hasn't the initial spam fed become superfluous by now?  
Since installation dspam has been trained regularly. I mean dspam  
worked a lot better like 6-8 months ago.

-Patrik

------------------------------------------------------------------------------
_______________________________________________
Dspam-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspam-user

Reply via email to