[Bug 4876] New: CRM114 Plugin for SpamAssassin (comments, please!)

bugzilla-daemon Mon, 24 Apr 2006 06:30:17 -0700

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4876


           Summary: CRM114 Plugin for SpamAssassin (comments, please!)
           Product: Spamassassin
           Version: 3.0.3
          Platform: Other
        OS/Version: Linux
            Status: NEW
          Severity: enhancement
          Priority: P5
         Component: Plugins
        AssignedTo: [email protected]
        ReportedBy: [EMAIL PROTECTED]


Our SA implementation (via MailScanner and Exim) is working very well, dropping
some 90% of spam immediately and only missing (i.e. not tagging) some 5% of the
remaining "questionable" stuff.  Still, the amount of mail that gets tagged
(above a score of 4 and below 8) and passed to users is significant.

I wanted to try using CRM114 within SpamAssassin to augment the existing "bayes"
learner.  I wanted it to discriminate between spam/ham for messages that could
not be classified accurately by existing rules and, as such, not waste it's
resources for things already handled elsewhere.  I've written this plug-in to
test it out.

Right now, I'm just using the basic "classifymail.crm" script that comes with
CRM114 with a few modifications as to where to find files.

This is my first attempt at a plug-in and working with v3 of SpamAssassin, so I
appologize if it's not as elegant as it could be.

A few notes about the plugin:  .../SpamAssassin/Plugin/CRM114.pm

* It skips itself unless the current score is within the -5 to 15 range.  I
believe this will avoid running it for messages that are already obvious.  I
choose this range on the assumption that the rule weightings would never be more
than +/- 10 and thus would never be able to change the final decision on
messagse outside of that range.  I've set the rule priorities to run this rule 
last.

* I intended to train CRM only with messages that user supply as either false
positives or false negatives.  This contrants with the standard learningh system
that auto-learns from everything.  (I know I can disable auto-learn, but I want
CRM to work on a _different_ problem than the existing rules.)

* I still have to figure out how to actually do that training.  To train or
original messages would be a different data set than the "rendered" text it's
classifying.  What I need is a method to have SpamAssassin render a message and
dump it's output rather than running rules on it.


I'd appreciate any comments people have.  I've placed the plugin code in the
public domain.  The CRM filter file did not have a copyright notice on the
original; since it was an example, I suspect it's also public domain but can't
say for sure.  I am sure, however, that anybody with some CRM knowledge could
write a better classifier that what I present here.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4876] New: CRM114 Plugin for SpamAssassin (comments, please!)

Reply via email to