On 04/22/2010 09:33 AM, news...@acrocat.com wrote:
> I am wondering... I have a bunch of ham mail that is in mbox format.
>
> Should I use dspam_train and run through say 2500 ham messages? Or shoudl
> I balance that out with spam as well? Or just let the training happen on
> incoming new messages?
>
>    

Hello,

In the long run, training from scratch is what always gives you better 
results. But no-one wants to wait all that time to start to see results 
right?

That said, you should balance your training, and not necessarily but 
feeding the same amount of each. You should get an idea on the actual 
amount of spam and ham that you actually receive and try to train using 
that proportion.
That's at least what always got me the better results. Others might have 
other ideas though.

Regards,

Hugo Monteiro.

-- 
fct.unl.pt:~# cat .signature

Hugo Monteiro
Email    : hugo.monte...@fct.unl.pt
Telefone : +351 212948300 Ext.15307
Web      : http://hmonteiro.net

Divisão de Informática
Faculdade de Ciências e Tecnologia da
                   Universidade Nova de Lisboa
Quinta da Torre   2829-516 Caparica   Portugal
Telefone: +351 212948596   Fax: +351 212948548
www.fct.unl.pt                ap...@fct.unl.pt

fct.unl.pt:~# _


------------------------------------------------------------------------------
_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user

Reply via email to