https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8094

            Bug ID: 8094
           Summary: Non balanced bayes ratio in db makes the accuracy
                    plummet
           Product: Spamassassin
           Version: 4.0.0
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Learner
          Assignee: dev@spamassassin.apache.org
          Reporter: il...@foobar.fi
  Target Milestone: Undefined

spamassassin-4.0.0-0.30.svn1903083

Does the SA Bayes implementation assume 50-50 ham-spam ratio?

We have been seeing poor accuracy on systems where ratio is not balanced, but
ranging between 97-3 and 83-17.

Is it possible to change that or make an alternative bayes implementation which
would consider also the probability according the db ratio of tokens?

Here's an example of such a system where ratio is not in balance.
$ sa-learn --dump magic
--
0.000          0          3          0  non-token data: bayes db version
0.000          0    6184682          0  non-token data: nspam
0.000          0   29523157          0  non-token data: nham
0.000          0    2225793          0  non-token data: ntokens
--

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to