https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6963
Bug ID: 6963
Summary: Anybody cares for a saved millisecond or two in
computing bayes probabilities for tokens?
Product: Spamassassin
Version: SVN Trunk (Latest Devel Version)
Hardware: All
OS: All
Status: NEW
Severity: enhancement
Priority: P2
Component: Libraries
Assignee: [email protected]
Reporter: [email protected]
Wondering where 'b_comp_prob' reported timing entry spends its time
(computing bayes probabilities for tokens), I played a bit with the
beautiful NYTProf perl profiler and shuffled some Bayes code while
keeping its functionality unchanged.
The basic idea is to compute a probability for all tokens in one
go, instead of calling the _compute_prob_for_token() for each token.
This allows for factoring out unchanging sections from the loop.
So instead of:
Plugin::Bayes::_compute_prob_for_token
we now call:
Plugin::Bayes::_compute_prob_for_all_tokens
(and the _compute_prob_for_token() is now just a wrapper).
Savings are less than I hoped, about 1.2 ms for a typical larger
message with one or two hundred tokens, and a barely noticeable
speedup for messages with only a few tokens. When dumping tokens
(sa-learn --dump) the saving is about 6 seconds (out of one minute)
with my current redis database.
Still, the work is done now, I wonder whether we like it folded in,
or not.
--
You are receiving this mail because:
You are the assignee for the bug.