Re: [Mimedefang] learner indicated ham

Bill Cole Tue, 12 Aug 2014 10:14:34 -0700

On 11 Aug 2014, at 10:22, Justin Edmands wrote:

Bill,
Thank you very much for the response. The detail is much appreciated.
As Ged mentioned, not vague, helpful to say the least. The part about
highly trusted rules caught my attention:


"Another way to increase autolearning without going all the way to the
"learn on error" behavior is to flag rules that you trust highly as
"autolearn_force" so that messages matching them won't ever be
excluded from autolearning based on the existing Bayes DB disagreeing
with the deterministic rules."

I think these will get me started:

tflags URIBL_DBL_SPAM autolearn_force
tflags URIBL_JP_SURBL autolearn_force
tflags URIBL_BLACK autolearn_force
tflags INVALID_DATE autolearn_force

Any others that are definites?

That's a hard question for anyone to answer without knowing yourmailstream's quirks. I can't tell you who your users are and what sortof mail they want that matches which rules. The default SA rules havemostly low scores because they are all individually highly error-prone.

I'm especially wary about putting too much trust in individual rulesbecause I get lots of mail that talks about spam, often with things likelists of evil domains that trigger URIBL rules. And INVALID_DATE showsup in a surprising number of ethically upstanding but technically sordidmessages (e.g. Terminix customer notices.) This is why I reserveautolearn_force for meta-rules, since it carries a risk of turning a fewfalse positives into a bad Bayes DB. The specific example of what Idescribed that I can share is this locally-defined rule:


describe URIBL_MULTI1 Multiple URIBL  hits

meta URIBL_MULTI1 URIBL_DBL_SPAM + URIBL_RED + URIBL_BLACK + URIBL_SBL +URIBL_WS_SURBL + URIBL_OB_SURBL + URIBL_JP_SURBL + URIBL_SC_SURBL > 2

score URIBL_MULTI1 10
tflags URIBL_MULTI1 autolearn_force

That means that if 3 or more of 8 different URIBL tests hit on amessage, In tack on an extra 10 point and override the learnerprotections. I should add a note of warning by example: last week athread in the Postfix users list was started with a message including along list of spammer domains, causing the original message and any thatfully quoted it to match *6* of those URIBLs. If your mailstreamincludes mail discussing spam, you have to take precautions to protectfrom such things ruining your Bayes DB.

My other autolearn_force rules are also meta-rules that bundle multiplerules, but I unfortunately cannot freely share their details as theconstituent rules come from private (i.e. encumbered) sources. Thegeneral process I use is to look for clusters of rules (positive ORnegative) that often hit together on mail that gets a Bayes score in theopposite direction. Before SA 3.4 I just set high scores on thosemeta-rules to assure rejection, but autolearn_force improves on that.

_______________________________________________
NOTE: If there is a disclaimer or other legal boilerplate in the above
message, it is NULL AND VOID.  You may ignore it.

Visit http://www.mimedefang.org and http://www.roaringpenguin.com
MIMEDefang mailing list [email protected]
http://lists.roaringpenguin.com/mailman/listinfo/mimedefang

Re: [Mimedefang] learner indicated ham

Reply via email to