https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8094
Bug ID: 8094 Summary: Non balanced bayes ratio in db makes the accuracy plummet Product: Spamassassin Version: 4.0.0 Hardware: PC OS: Linux Status: NEW Severity: normal Priority: P2 Component: Learner Assignee: dev@spamassassin.apache.org Reporter: il...@foobar.fi Target Milestone: Undefined spamassassin-4.0.0-0.30.svn1903083 Does the SA Bayes implementation assume 50-50 ham-spam ratio? We have been seeing poor accuracy on systems where ratio is not balanced, but ranging between 97-3 and 83-17. Is it possible to change that or make an alternative bayes implementation which would consider also the probability according the db ratio of tokens? Here's an example of such a system where ratio is not in balance. $ sa-learn --dump magic -- 0.000 0 3 0 non-token data: bayes db version 0.000 0 6184682 0 non-token data: nspam 0.000 0 29523157 0 non-token data: nham 0.000 0 2225793 0 non-token data: ntokens -- -- You are receiving this mail because: You are the assignee for the bug.