ya I get your point, but see these are part of any machine learning
project, and feature extraction has to be done considering the synthetic
data set.


On 17 April 2013 22:05, Terri Oda <te...@zone12.com> wrote:

>
>
> Finding sources of spam (like that one) isn't that hard; it's finding
> sources of legit email combined with spam and classified and processed in
> the same way that's challenging.  As I said, you can combine a spam source
> like this with a publicly available mailing list to make a synthetic set,
> but scientifically speaking, those aren't really preferred ways to handle
> data because they come from multiple sources.
>
>
>
    well in this regard the only thing I can do is keep looking, I am also
aware that coming from different sources can make them skewed but again
these things are never perfect and there are always scope for betterment, I
think that our aim should be to implement a rudimentary classifier with
fairly good performance to start with.
_______________________________________________
Mailman-Developers mailing list
Mailman-Developers@python.org
http://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9

Reply via email to