https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6963

            Bug ID: 6963
           Summary: Anybody cares for a saved millisecond or two in
                    computing bayes probabilities for tokens?
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Hardware: All
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Libraries
          Assignee: [email protected]
          Reporter: [email protected]

Wondering where 'b_comp_prob' reported timing entry spends its time
(computing bayes probabilities for tokens), I played a bit with the
beautiful NYTProf perl profiler and shuffled some Bayes code while
keeping its functionality unchanged.

The basic idea is to compute a probability for all tokens in one
go, instead of calling the _compute_prob_for_token() for each token.
This allows for factoring out unchanging sections from the loop.
So instead of:
  Plugin::Bayes::_compute_prob_for_token
we now call:
  Plugin::Bayes::_compute_prob_for_all_tokens
(and the _compute_prob_for_token() is now just a wrapper).

Savings are less than I hoped, about 1.2 ms for a typical larger
message with one or two hundred tokens, and a barely noticeable
speedup for messages with only a few tokens. When dumping tokens
(sa-learn --dump) the saving is about 6 seconds (out of one minute)
with my current redis database.

Still, the work is done now, I wonder whether we like it folded in,
or not.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to