https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6972
Bug ID: 6972
Summary: BayesStore/Redis.pm: ditching CPAN module Redis gains
speed, avoids its bugs, works on IPv6
Product: Spamassassin
Version: SVN Trunk (Latest Devel Version)
Hardware: All
OS: All
Status: NEW
Severity: enhancement
Priority: P2
Component: Libraries
Assignee: [email protected]
Reporter: [email protected]
Considering that the CPAN module Redis (implementing client-side
of the redis protocol in pure perl) is:
- quite inefficient due to its baroque API design (e.g. PR #46, #49)
(e.g. poor batching, use of callbacks, use of ungetc);
- the project seems to have stalled - no response to problem reports
for the last four months, last release from January;
- has some bugs affecting its use in SpamAssassin (#38),
whose workaround is only available since version 1.956 (see
Bug 6965) which some Linux distributions haven't picked up yet;
- does not support IPv6, which is available since redis 2.8.0;
- adds one more module dependency to SpamAssassin; ...
I decided to try my hand at re-writing the client-side of the redis
protocol from scratch, with the intent of avoiding bottlenecks
which have been indicated by profiling our usage and benchmarks
with a perl profiler. The redis protocol is fairly simple, so
the solution came out as 200 lines of perl code, including
necessary error handling and comments. This code is now included
in amavisd-new version 2.8.2, where I faced the same problems.
The BayesStore/Redis.pm module needed a couple of fairly
straightforward adjustments, often in a form of simplifications,
and during the few weeks since I'm running this code it proved
itself to be fast and robust. Works well over IPv6 too :)
The code paths involving a Lua scripting in a redis server are
still faster than the non-Lua case, but due to the now efficient
command batching (pipelining) code, the non-Lua case comes fairly
close to the Lua case. In two Bayes function they turned out to be
truly the same speed, which eliminated a need for two Lua functions:
multi_expire_script and multi_hincrby.
So despite including the full redis protocol support, the resulting
BayesStore/Redis.pm still has fewer lines than the current version
we have, and saves a few bytes of memory by not implementing what
we don't need (like message passing / subscriptions).
Perhaps the most illustrative case comes from timing of the
'sa-learn --backup' usage. While this is a marginal and non-critical
usage case, the same ratio applies to other code sections (like
multi-token lookups, as proved by measuring elapsed times on a
production mailer).
Benchmark:
elapsed time needed for 'sa-learn --backup',
800.000 real-life tokens, 2200 'seen' entries:
- original CPAN Redis module, with Lua: 37 seconds
- new protocol client code, with Lua: 26 seconds
- original CPAN Redis module, no Lua: 88 seconds
- new protocol client code, no Lua: 37 seconds
As can be seen, the batched classical use is now more than
twice the speed, while the Lua case also benefited by some 50%.
So I'd like to fold in the replaced BayesStore/Redis.pm, if
you don't mind, and un-document a dependency on the CPAN module.
The database itself is unchanged, so full compatibility is retained.
Due to the number of diffs which makes a review difficult,
I'll attach the full replacement Redis.pm, not a patch.
--
You are receiving this mail because:
You are the assignee for the bug.