Edward J. Yoon wrote:
Hi all,

To reduce the efforts of the artificial management for planet-scale
mail service, I'm consider about the statistical spam filtering with
the SpamAssasin, Hadoop (distributed computing), Hama (parallel matrix
computing) projects.

Please any advice (or experience) !!

Have you spoken to SpamAssassin? They'd probably love to get involved in a streams-based filtering system. One thing to know there is that a lot of their test data is private, as they have to include lots of legitimate email alongside the spam, so their big datasets aren't always that public.

Talk to Justin Mason and the spamassassin developers

-steve

Reply via email to