ya I get your point, but see these are part of any machine learning project, and feature extraction has to be done considering the synthetic data set.
On 17 April 2013 22:05, Terri Oda <te...@zone12.com> wrote: > > > Finding sources of spam (like that one) isn't that hard; it's finding > sources of legit email combined with spam and classified and processed in > the same way that's challenging. As I said, you can combine a spam source > like this with a publicly available mailing list to make a synthetic set, > but scientifically speaking, those aren't really preferred ways to handle > data because they come from multiple sources. > > > well in this regard the only thing I can do is keep looking, I am also aware that coming from different sources can make them skewed but again these things are never perfect and there are always scope for betterment, I think that our aim should be to implement a rudimentary classifier with fairly good performance to start with. _______________________________________________ Mailman-Developers mailing list Mailman-Developers@python.org http://mail.python.org/mailman/listinfo/mailman-developers Mailman FAQ: http://wiki.list.org/x/AgA3 Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org Security Policy: http://wiki.list.org/x/QIA9