On Fri, Nov 26, 2004 at 02:06:05PM -0800, Vaishnavi Sannidhanam wrote:
> I am a student a University of Washington and I am doing a project on
> classifying spam. I was wondering where could I find the spam assassin
> corpus of ham and spam mails and where would I also find some tools to
> process these mails.

Hi.

Unfortunately there is no single "SpamAssassin corpus".  All of the people
involved in development (including the folks who help out with score
generation and testing) each have their own private corpus of messages.
The tools (specifically mass-check) under the "masses" directory (see
the tarball) are used to generate logs from the corpus specifying the
messages processed and the results from the processing (namely what rules
hit).

That information is then used to generate the scores, determine which rules
are worth keeping during devleopment, etc.

There is some more information available at:

http://wiki.apache.org/spamassassin/DevelopmentStuff

-- 
Randomly Generated Tagline:
Two-hundred-thirty-nine pounds?!  I'm a blimp!  Why are all the good
 things so tasty?
 
                -- Homer Simpson
                   Brush With Greatness

Attachment: pgpcCKYLGVgz4.pgp
Description: PGP signature

Reply via email to