On the question of norm optimizing: It appears that adjusting MaxFiles or 
MaxBytes on the front end when messages are collected isn't the best place 
to perform spamdb normalization.

Adjusting MaxFiles to collect more messages in spam or notspam really has no 
guarantee of effectiveness. In my case I have about 6500 spam messages with 
1.1M words and 2500 notspam messages with 1.6M words. This is probably due 
to the fact that the spam vocabulary is very limited compared to the usual 
ham vocabulary. So if I adjust MaxFiles to 6500 (the highcount of the two) 
that won't effectively increase spam words to bring the norm closer to 1 
because the spam directory would already be at MaxFiles. If anything, it has 
the opposite effect by collecting even more notspam messages (and words.)

A better solution would be to modify rebuildspamdb.pl to look a the the 
spam/notspam word counts in the corpus and impose some limits on the number 
of words included in the spamdb output to bring the norm back into the 
acceptable range.

Thoughts/comments?

Regards,

Craig




----- Original Message ----- 
From: "Steve Thompson" <[EMAIL PROTECTED]>
To: "'ASSP development mailing list'" <[email protected]>
Sent: Tuesday, January 22, 2008 8:35 AM
Subject: Re: [Assp-test] Do norm optimizing question


>
>>
>> Ok, i changed that in the latest release.
>>
>> fritz
>>
>
> What exactly was changed here?
>
> Is it supposed to increase the norm if it is low?
>
>
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> Assp-test mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/assp-test 



-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Assp-test mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/assp-test

Reply via email to