http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5497
------- Additional Comments From [EMAIL PROTECTED] 2007-06-07 10:03 ------- (In reply to comment #32) > My interpretation of the comments in bug 5257 is that people were reporting > problems with a threshold of 0.1 because of too many low scoring spam being > incorrectly learned so the ham threshold was lowered to -1.0. It is not clear if the problem is that spam is incorrectly learned as ham, or that there are just too many learning operations going on on a heavily loaded system and it would be desirable to cut it down a bit. When the latter is the actual problem, I suggest the method from comment #29 to be used, not a change of the threshold. Problem is that in our case we apparently get no messages below auto_learn -1.0 at all. Even at 0.1 there are many ham messages not handed to auto_learn because it is so easy to get above 0.1 when AWL and BAYES_00 are not counted. Rules like RDNS_NONE are firing half of the time here, HTML_MESSAGE almost always. This means that even a slight change in the scoring will easily disable the autolearn=ham. This probably explains the change in behavior when installing the update to 3.2.0 I will try the mentioned patch and see what a more reasonable value for the threshold is in our case. I can understand that you want to avoid feed-forward lockups by excluding the score of BAYES_xx in the calculation, and to a lesser extent I can understand the exclusion of AWL, but all together it makes the auto_learn quite fragile. Something that also affects our Bayes DB is that we are a locally operating company where 99+ % of all mail is in Dutch. So the Bayes engine has learned over time that Dutch=HAM and English=SPAM. This normally works well, but when someone sends a message from freemail providers that tag an English commercial under each mail, and they send only an attachment with little body text, it is scored at Bayes_80 or more, and lifted over our spam threshold by simple things like omitting the subject. And those messages are never learned as ham because those freemail providers invariably score points in the "ignorance" and "HTML" categories. So our Bayes DB never gets learned that "Choose the right car based on your needs. Check out Yahoo! Autos new Car Finder tool." does not really mean the message is SPAM. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
