On 09/02/2014 11:06 AM, Ted Mittelstaedt wrote:

masscheck runs against your spam and ham.  But, masscheck does not know
if what your feeding it is actually ham or spam until you have gone
through your corpora and sorted it - moved the spam to the spam folder
and the ham to the ham folder (assuming that is that you get any false
positives)  That is why you say you want the corpora cleaned and hand
classified.

This is something that I only do every once in a while when I'm
preparing corpora for my bayes database.  If I setup masscheck to
look at my inbox and my junk mail folder on a nightly basis, there
is no guarantee that I happened to get to my mail that day or that week
even to make sure that only ham is in my inbox and only spam is in
my junk mail folder.

This is where you need *commitment* (a few hours/week) to sort your stuff. If you can't be bothered, it's much easier to sit back and drop the load on others...

If I have a folder full of spam that my local install of SpamAssassin
has already marked as spam, then how does telling the SA project
"yep, ya got that right" change anything in the rules scoring?

It helps by pushing autopromoting sandbox rules, raising scores, etc.


There is a lack of explanation on the masscheck page as to how and
why it's useful.  And it is also clear that accidentally leaving spam
(spam that has not been identified as spam by SA) in your ham folder,
and false positives (ham) in your spam folder, is not going to help
masscheck any - if anything it's going to make the SA scoring worse.
That seems to me to be very important.

Perhaps that is why so few participate?  They do not understand why
masscheck is important to the SA project because the documentation on
it does not explain why.

Filling a wiki with lots of information tends to scare ppl away. Those who are truly interested in contributing will ask for information hints, ehlp and there's always devs available willing to help.

Most "others" out there using OSS packages do not have the skills to
contribute development time, even to contribute rules that do not
have unintended consequences.  You might think it simple to write
a rule but it's not the writing it that is the problem it is the
thinking about the consequences.

Which is wy rules are not published blindly - there's GA which does a pretty good job at weeding the bad stuff out and user feedback isn't ignored.

I've seen some real showstoppers in SpamAssassin rules such as the time
that someone wrote a rule to target certain spam that ended up
triggering off Outlook Express.

Don't know when that happened or who wrote that rule, but I do know that that there's devs who are *very* sensitive to that sort of stuff leaking into SA's ruleset and battle them real loud.

I just think the SA developers are falling just a bit too conservative on this.

And that's the good thing - the SA SVN tree gives you all the tools to run your own fork, with GA/Perceptron and and ALL the goodies. (you just need to glue the whole party together and no, that is *not* well documented)


For starters as a SA user I do not feel the project is served by
multiple sa-update channels promulgating different rulesets, if I
had the coding ability to create a huge body of rules on par with
the existing SA rules, I would absolutely not set it up as a competing
ruleset.

Contribute by setting up and maintaining an extra sa-update channel. while others may take over writing the rules, is also an approach, but again the magic word is *commitment*

and now back to reviewing my spam trap inboxes for today's masscheck run.....






Reply via email to