On Fri, Nov 26, 2004 at 02:06:05PM -0800, Vaishnavi Sannidhanam wrote: > I am a student a University of Washington and I am doing a project on > classifying spam. I was wondering where could I find the spam assassin > corpus of ham and spam mails and where would I also find some tools to > process these mails.
Hi. Unfortunately there is no single "SpamAssassin corpus". All of the people involved in development (including the folks who help out with score generation and testing) each have their own private corpus of messages. The tools (specifically mass-check) under the "masses" directory (see the tarball) are used to generate logs from the corpus specifying the messages processed and the results from the processing (namely what rules hit). That information is then used to generate the scores, determine which rules are worth keeping during devleopment, etc. There is some more information available at: http://wiki.apache.org/spamassassin/DevelopmentStuff -- Randomly Generated Tagline: Two-hundred-thirty-nine pounds?! I'm a blimp! Why are all the good things so tasty? -- Homer Simpson Brush With Greatness
pgpcCKYLGVgz4.pgp
Description: PGP signature
