On 04/22/2010 09:33 AM, news...@acrocat.com wrote: > I am wondering... I have a bunch of ham mail that is in mbox format. > > Should I use dspam_train and run through say 2500 ham messages? Or shoudl > I balance that out with spam as well? Or just let the training happen on > incoming new messages? > >
Hello, In the long run, training from scratch is what always gives you better results. But no-one wants to wait all that time to start to see results right? That said, you should balance your training, and not necessarily but feeding the same amount of each. You should get an idea on the actual amount of spam and ham that you actually receive and try to train using that proportion. That's at least what always got me the better results. Others might have other ideas though. Regards, Hugo Monteiro. -- fct.unl.pt:~# cat .signature Hugo Monteiro Email : hugo.monte...@fct.unl.pt Telefone : +351 212948300 Ext.15307 Web : http://hmonteiro.net Divisão de Informática Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa Quinta da Torre 2829-516 Caparica Portugal Telefone: +351 212948596 Fax: +351 212948548 www.fct.unl.pt ap...@fct.unl.pt fct.unl.pt:~# _ ------------------------------------------------------------------------------ _______________________________________________ Dspam-user mailing list Dspam-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspam-user