> Hi Patrik, > > I just pulled that message up. One thing that I notice is that the > number of corpusfed spam messages is 100X the number of non-spam > messages. That is a recipe for poor performance. The recommendation > is 1-2k messages of each type. Try adding 1500 spam messages to your > training corpus. My mental picture for this poor tagging is that > each corpus spam and not-spam is two orthogonal comb filters. If > you only use 1 dimension you get much poored discrimination than > if you use both dimensions. In other words, it is key to know not > only what is bad mail but also what is good mail. The training > mode should be selected to keep the on-going training balanced as > well. I hope that this helps. > > Regards, > Ken
First of all, thanks for all replies! Great response from the mailing list. Bear in mind that I don't have any experience with dspam other than installing it (which was done a 1-2 years ago I guess). I was feeding it from the beginning with a lot of old spam but not many non-spam as you noticed. Didn't realize that would have an impact. But hasn't the initial spam fed become superfluous by now? Since installation dspam has been trained regularly. I mean dspam worked a lot better like 6-8 months ago. -Patrik ------------------------------------------------------------------------------ _______________________________________________ Dspam-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspam-user
