Noel,

Though this is probably a good source of "SPAM", one also needs to be able
to feed in "HAM", otherwise the routines won't really work as well as they
could <g>...if at all.

That being said, yes, the package I submitted already includes a class for
importing SPAM from a file, I can review it and see what changes need to be
made to include a whole folder or something.

One note of caution...my routines include the headers in the analysis (for
better or worse), and it will tend to "label" the sender as a spammer, and
as the archive (as far as I could tell from a quick browse) includes only
the headers for the mail session where the user forwarded the message into
the archive, this means that the person who reported the SPAM could be
labelled as a SPAMmer.

Make sense?

The bayesian analysis my routines perform (and the others that I've seen)
really work best for YOUR spam, not other peoples...however, this does not
mean that the spamarchive wouldn't be a good place to feed test SPAM data
from...but you'd need a HAM feed too...in order to perform basic testing.

-Chris

> -----Original Message-----
> From: Noel J. Bergman [mailto:[EMAIL PROTECTED]]
> Sent: Monday, January 27, 2003 4:36 PM
> To: James Users List
> Subject: RE: Spam filtering mailets wanted...
>
>
> With respect to our friend Chris' Bayesian matcher, are people aware of
> www.spamarchive.org?  Seems to me that we could test Chris' filter with
> their archives.
>
> Chris, do you think you could look at producing a standalone app
> to populate
> the tables from their archives?  Alternatively, code could read
> the archives
> and send the messages to the Bayesian matcher.
>
>       --- Noel
>
>
> --
> To unsubscribe, e-mail:
> <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail:
> <mailto:[EMAIL PROTECTED]>
>
>



--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to