On Wed, Jun 24, 2009 at 09:26:03PM +0200, Patrik Jansson wrote:
>
>> Hi Patrik,
>>
>> I just pulled that message up. One thing that I notice is that the
>> number of corpusfed spam messages is 100X the number of non-spam
>> messages. That is a recipe for poor performance. The recommendation
>> is 1-2k messages of each type. Try adding 1500 spam messages to your
>> training corpus. My mental picture for this poor tagging is that
>> each corpus spam and not-spam is two orthogonal comb filters. If
>> you only use 1 dimension you get much poored discrimination than
>> if you use both dimensions. In other words, it is key to know not
>> only what is bad mail but also what is good mail. The training
>> mode should be selected to keep the on-going training balanced as
>> well. I hope that this helps.
>>
>> Regards,
>> Ken
>
> First of all, thanks for all replies! Great response from the mailing list.
>
> Bear in mind that I don't have any experience with dspam other than 
> installing it (which was done a 1-2 years ago I guess).
> I was feeding it from the beginning with a lot of old spam but not many 
> non-spam as you noticed. Didn't realize that would have an impact. But 
> hasn't the initial spam fed become superfluous by now? Since installation 
> dspam has been trained regularly. I mean dspam worked a lot better like 6-8 
> months ago.
>
> -Patrik
>
Patrik,

Check your DSPAM performance tab in the web GUI. Look at the total
processed by filter lines. Unless you are in a 50/50 split, you
should be using TOE as your training method. In some cases TUM is
a better choice with a 75-85% spam ratio. TEFT is not usually a
good choice.

Cheers,
Ken


------------------------------------------------------------------------------
_______________________________________________
Dspam-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspam-user

Reply via email to