On 11 Aug 2014, at 10:22, Justin Edmands wrote:

Bill,
Thank you very much for the response. The detail is much appreciated.
As Ged mentioned, not vague, helpful to say the least. The part about
highly trusted rules caught my attention:

"Another way to increase autolearning without going all the way to the
"learn on error" behavior is to flag rules that you trust highly as
"autolearn_force" so that messages matching them won't ever be
excluded from autolearning based on the existing Bayes DB disagreeing
with the deterministic rules."

I think these will get me started:

tflags URIBL_DBL_SPAM autolearn_force
tflags URIBL_JP_SURBL autolearn_force
tflags URIBL_BLACK autolearn_force
tflags INVALID_DATE autolearn_force

Any others that are definites?

That's a hard question for anyone to answer without knowing your mailstream's quirks. I can't tell you who your users are and what sort of mail they want that matches which rules. The default SA rules have mostly low scores because they are all individually highly error-prone.

I'm especially wary about putting too much trust in individual rules because I get lots of mail that talks about spam, often with things like lists of evil domains that trigger URIBL rules. And INVALID_DATE shows up in a surprising number of ethically upstanding but technically sordid messages (e.g. Terminix customer notices.) This is why I reserve autolearn_force for meta-rules, since it carries a risk of turning a few false positives into a bad Bayes DB. The specific example of what I described that I can share is this locally-defined rule:

describe URIBL_MULTI1 Multiple URIBL  hits      
meta URIBL_MULTI1 URIBL_DBL_SPAM + URIBL_RED + URIBL_BLACK + URIBL_SBL + URIBL_WS_SURBL + URIBL_OB_SURBL + URIBL_JP_SURBL + URIBL_SC_SURBL > 2
score URIBL_MULTI1 10
tflags URIBL_MULTI1 autolearn_force

That means that if 3 or more of 8 different URIBL tests hit on a message, In tack on an extra 10 point and override the learner protections. I should add a note of warning by example: last week a thread in the Postfix users list was started with a message including a long list of spammer domains, causing the original message and any that fully quoted it to match *6* of those URIBLs. If your mailstream includes mail discussing spam, you have to take precautions to protect from such things ruining your Bayes DB.

My other autolearn_force rules are also meta-rules that bundle multiple rules, but I unfortunately cannot freely share their details as the constituent rules come from private (i.e. encumbered) sources. The general process I use is to look for clusters of rules (positive OR negative) that often hit together on mail that gets a Bayes score in the opposite direction. Before SA 3.4 I just set high scores on those meta-rules to assure rejection, but autolearn_force improves on that.
_______________________________________________
NOTE: If there is a disclaimer or other legal boilerplate in the above
message, it is NULL AND VOID.  You may ignore it.

Visit http://www.mimedefang.org and http://www.roaringpenguin.com
MIMEDefang mailing list [email protected]
http://lists.roaringpenguin.com/mailman/listinfo/mimedefang

Reply via email to