Hi Theo, I got a spam assassin corpus that had ~3500 ham and spam messages in it. I was wondering if I could get a larger collection of corpus or a bunch of smaller corpora that I can put together to get a bigger corpus. Please let me know if I can get it from somewhere.
Thank you very much and really appreciate all your help, Vaishnavi -----Original Message----- From: Theo Van Dinter [mailto:[EMAIL PROTECTED] Sent: Friday, November 26, 2004 3:50 PM To: [email protected] Subject: Re: Spam assassin corpus On Fri, Nov 26, 2004 at 02:06:05PM -0800, Vaishnavi Sannidhanam wrote: > I am a student a University of Washington and I am doing a project on > classifying spam. I was wondering where could I find the spam assassin > corpus of ham and spam mails and where would I also find some tools to > process these mails. Hi. Unfortunately there is no single "SpamAssassin corpus". All of the people involved in development (including the folks who help out with score generation and testing) each have their own private corpus of messages. The tools (specifically mass-check) under the "masses" directory (see the tarball) are used to generate logs from the corpus specifying the messages processed and the results from the processing (namely what rules hit). That information is then used to generate the scores, determine which rules are worth keeping during devleopment, etc. There is some more information available at: http://wiki.apache.org/spamassassin/DevelopmentStuff -- Randomly Generated Tagline: Two-hundred-thirty-nine pounds?! I'm a blimp! Why are all the good things so tasty? -- Homer Simpson Brush With Greatness
