[Bug 3109] RFE: really simple "this is ham" shortcircuiting

bugzilla-daemon Thu, 23 Mar 2006 05:33:18 -0800

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=3109






------- Additional Comments From [EMAIL PROTECTED]  2006-03-23 13:32 -------
OK, some agreement with Theo, and some disagreement ;)

> first, I really don't like the override score idea.  I don't think
> there's a real point in doing it, and it's going to cause confusion by
> people who add up the rules and see it's nowhere near the rule score
> sum.

-1
I think it provides a better, more familiar UI for users.  Compare
(assuming we fix the "something in x-spam-status" issue below):

  X-Spam-Status: Yes, score=1.9 required=5.0 tests=BAYES_50,HTML_IMAGE_ONLY_28,
        HTML_MESSAGE,MIME_HTML_ONLY,SC_TEST shortcircuit=spam
        autolearn=no version=3.2.0-r372567

eh?  "Yes, score=1.9 required=5.0", wtf?
that's less intuitive, and more likely to cause confusion, than

  X-Spam-Status: Yes, score=15.0 required=5.0 tests=BAYES_50,HTML_IMAGE_ONLY_28,
        HTML_MESSAGE,MIME_HTML_ONLY,SC_TEST shortcircuit=spam
        autolearn=no version=3.2.0-r372567

it really makes less sense in my opinion for the mail to be marked as spam
with a low score, than for the tests not to add up. ;)

(Personally, I'd prefer to see a more "extreme" score value -- -100 and
100, for example -- to give another "hint" to users that something
out-of-the-ordinary has occurred.  But I don't think that's a
vote-stopper issue.  for example:)

  X-Spam-Status: Yes, score=100.0 required=5.0 tests=BAYES_50,
        HTML_IMAGE_ONLY_28,HTML_MESSAGE,MIME_HTML_ONLY,SC_TEST
        shortcircuit=spam autolearn=no version=3.2.0-r372567

> second, there has to be something in the x-spam-status header to
> indicate that SC occured.

+1
agreed on that point, I didn't spot that this was missing. ;)

> third, mass-check wrt reuse needs to be modified to take SC into
> account.  we'll also need to let people who do mass-checks know that
> they really shouldn't use SC since we want to see the hit results for
> all network tests (the current reuse behavior).

+1
OK, that is a good point.  if the X-Spam-Status line is fixed to
record "shortcircuit=spam" etc., it'll be easy to detect in mass-check.

> fourth, what about autolearn?  it's likely autolearn won't happen by
> default (too few hits, etc,) but it could also potentially learn the
> wrong way depending on what rules hit.  I'd like to get a consensus on
> what should happen here (IMO, autolearn is skipped for SC).

-1
I don't think this needs to block autolearning necessarily.  It can be
driven by the rules that caused the shortcircuiting.  For example, let's
say USER_IN_WHITELIST is the rule that shortcircuited -- that rule already
mandates "noautolearn", so there's no need for the act of shortcircuiting
to do so, too.

There's a possibility that some extremely reliable shortcircuiting rules
(esp ones that don't rely on user/admin input) can provide good data
for autolearning, so let's not rule it out.

by the way the "autolearn the wrong way" scenario should be impossible
already, see lib/Mail/SpamAssassin/Plugin/AutoLearnThreshold.pm:

  dbg("learn: auto-learn? no: scored as ham but autolearn wanted spam");


> fifth, what about AWL?  similar to autolearn, except that the score is
> going to be completely off.  IMO, AWL is skipped during SC.

+1
ok good point, agreed.

> sixth, an easy addition to this patch would be short circuiting on
> current message score.  sometimes you don't care what rules hit, but you
> want to stop after a certain point.

-1
*no*. This is EXACTLY the approach that has been tried several times
before, with lousy results -- which resulted in the idea that this
approach is more worthwhile.  Let's not go around in circles!

It'll be easy enough to build additions on top of this simple change,
later.  We don't have to do everything *now*.

> seventh, and this is tricky, do we want to try moving the priorities
> around automatically when SC is enabled?  I was thinking of having code
> that sets default priorities, similar to default score, that goes
> through the SC rule listing and bumps the priority so they run first
> (including meta dependencies) unless the rule has a priority
> specifically set by config.  Then another loop which sets the default to
> priority 0.
> 
> eighth, should the SC decision code be in a plugin?

-1
No. There are a lot of issues around pluginizing parts of check(), and
they're being discussed in 2 other bugs simultaneously iirc!  Let's not
confuse matters even more by making it 3!  :(

This patch, in contrast, is pretty simple and small, and can be easily
refactored into a plugin *later* if desired, once the other pluginization
discussions are concluded with an agreement.

I think this is a great patch, adding some functionality that we *really*
*REALLY* need -- I know I, for one, have had to put in some upfront
black/whitelisting at the MTA level on one of my servers due to high load
issues, and this patch would allow me to avoid that kludge.

It does impact on other parts of the SpamAssassin design, but let's not
get dragged into a "what colour to paint the bikeshed" discussion. I
really feel that if we were to insist on pluggable shortcircuiting
algorithms immediately, we'd be heading down that road -- the existing
check()-pluginization discussions certainly give that impression to be
honest.

Again, what I'm saying is, there's no need to get it "perfect" in the
first svn commit.

If you want, I'd suggest we apply this patch to svn trunk (ok, after fixing
some or all of the +1's above ;), and then open some new bugs to deal with
evolutionary follow-on ideas...

--j.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 3109] RFE: really simple "this is ham" shortcircuiting

Reply via email to