Edward J. Yoon wrote:
Hi all,
To reduce the efforts of the artificial management for planet-scale
mail service, I'm consider about the statistical spam filtering with
the SpamAssasin, Hadoop (distributed computing), Hama (parallel matrix
computing) projects.
Please any advice (or experience) !!
Have you spoken to SpamAssassin? They'd probably love to get involved in
a streams-based filtering system. One thing to know there is that a lot
of their test data is private, as they have to include lots of
legitimate email alongside the spam, so their big datasets aren't always
that public.
Talk to Justin Mason and the spamassassin developers
-steve