On Wed, Jun 24, 2009 at 09:26:03PM +0200, Patrik Jansson wrote: > >> Hi Patrik, >> >> I just pulled that message up. One thing that I notice is that the >> number of corpusfed spam messages is 100X the number of non-spam >> messages. That is a recipe for poor performance. The recommendation >> is 1-2k messages of each type. Try adding 1500 spam messages to your >> training corpus. My mental picture for this poor tagging is that >> each corpus spam and not-spam is two orthogonal comb filters. If >> you only use 1 dimension you get much poored discrimination than >> if you use both dimensions. In other words, it is key to know not >> only what is bad mail but also what is good mail. The training >> mode should be selected to keep the on-going training balanced as >> well. I hope that this helps. >> >> Regards, >> Ken > > First of all, thanks for all replies! Great response from the mailing list. > > Bear in mind that I don't have any experience with dspam other than > installing it (which was done a 1-2 years ago I guess). > I was feeding it from the beginning with a lot of old spam but not many > non-spam as you noticed. Didn't realize that would have an impact. But > hasn't the initial spam fed become superfluous by now? Since installation > dspam has been trained regularly. I mean dspam worked a lot better like 6-8 > months ago. > > -Patrik > Patrik,
Check your DSPAM performance tab in the web GUI. Look at the total processed by filter lines. Unless you are in a 50/50 split, you should be using TOE as your training method. In some cases TUM is a better choice with a 75-85% spam ratio. TEFT is not usually a good choice. Cheers, Ken ------------------------------------------------------------------------------ _______________________________________________ Dspam-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspam-user
