[Bug 6828] Adjust default autolearn settings to reduce Bayesian mistraining under default configuration

bugzilla-daemon Thu, 16 Aug 2012 05:55:49 -0700

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6828


Kevin A. McGrail <[email protected]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[email protected]
   Target Milestone|Undefined                   |3.4.0
            Summary|Adjust default autolearn    |Adjust default autolearn
                   |ham threshold to reduce     |settings to reduce Bayesian
                   |mistraining under default   |mistraining under default
                   |configuration               |configuration

--- Comment #10 from Kevin A. McGrail <[email protected]> ---
It does seems that lowering the threshold for learning as ham makes sense to
try and avoid any FNs slipping through based on anecdotal complaints.  I think
this is also being extrapolated to a spam threshold change as well.

Anyone have suggestions on a testing protocol that might help decide the
defaults?  If I am thinking correctly, if we used masscheck data, the scoring
is designed not to mark spam as ham and ham as spam.  So the minimum threshold
should be the spam threshold.  This means that 12.0 is chosen at random as an
experienced guess for a number inflated for real-world safety.

Going further, my system is configured for 6.0 instead of 5.0 with a lot of
single-fire rules and things that focus on scoring ham.  So it doesn't make it
a good source of project-wide data concerning auto-learning thresholds.

In fact, I'm wondering a bit if a default setup can score below a zero very
often and if we are now going to skew bayes towards only certain
classifications of ham.

And in the end, none of our tweaked system data and configuration are relevant
to this discussion.


Looking at the thresholds, we really need a scientific approach based on the
DEFAULT configurations to continue this discussion.

bayes_auto_learn_threshold_nonspam n.nn   (default: 0.1)
bayes_auto_learn_threshold_spam n.nn      (default: 12.0)

And, in the end, I wonder also if we are missing turning on
bayes_auto_learn_on_error as a default.  I think for 3.4.0 turning this setting
on and losing the backwards compatibility makes sense.

Regards,
KAM

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6828] Adjust default autolearn settings to reduce Bayesian mistraining under default configuration

Reply via email to