Theo Van Dinter writes:
> On Sun, Aug 27, 2006 at 06:01:04PM -0400, Theo Van Dinter wrote:
> > On Sun, Aug 27, 2006 at 10:45:38PM +0100, Justin Mason wrote:
> > > hey dude -- looks very cool. How many machines have you tried it on
> > > so far?
> >
> > :) None, yet.
> [...]
>
> Ok, the code is generally usable at this point -- I can run the server,
> plus clients on both machines that I have, and the results all come
> out the same. I need to still clean up the mass-check code a bit and
> document some more, but I think it's beta-ish at the moment. :)
>
> Some preliminary results with a small set of ~4900 messages:
>
> Original version (trunk w/ -j2): 7:42 (5.3/cpu/sec)
> New version in normal mode (no client/server w/ -j2): 7:33 (5.4/cpu/sec)
> Client/Server w/ just remote client w/ -j2: 10:08 (4.0/cpu/sec)
> Client/Server w/ local and remote clients w/ -j2: 5:08 (8.0/cpu/sec)
>
> I should be able to tweak up the remote speed some more -- this was w/
> the defaults of max 1000 messages per run. Having a small dataset, and a
> limited number of messages per run causes the overhead to be a larger
> percentage of time than it would be w/ more messages and more messages per
> run.
>
>
> BTW: We should figure out why the throughput is so crappy now though.
> As I recall, I was able to push through 10-12 msgs/cpu/sec on earlier
> versions of SA.
I've been working on optimization -- the last two checkins should save about
7.5% of runtime on a typical mass-check. But overall, trunk is definitely
slower than, let's say, 3.0.x was. This is simply because there's many more
complex rules -- the ReplaceTags rules in particular are complex, and there's a
lot of body rules now. Here's how the profile currently looks:
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
59.8 11.23 11.587 223 0.0504 0.0520 Mail::SpamAssassin::PerMsgStatus::
_body_tests_0
4.10 0.769 0.866 223 0.0034 0.0039 Mail::SpamAssassin::PerMsgStatus::
_body_uri_tests_0
3.72 0.699 1.165 223 0.0031 0.0052 Mail::SpamAssassin::PerMsgStatus::
_head_tests_0
2.47 0.464 0.477 1197 0.0004 0.0004 Mail::SpamAssassin::Message::Metad
ata::parse_received_line
1.68 0.315 0.469 223 0.0014 0.0021 Mail::SpamAssassin::PerMsgStatus::
_meta_tests_500
1.54 0.289 0.000 413 0.0007 0.0000 utf8::SWASHNEW
1.17 0.220 0.237 3754 0.0001 0.0001 Text::Wrap::wrap
1.12 0.210 0.209 223 0.0009 0.0009 Mail::SpamAssassin::PerMsgStatus::
_rawbody_tests_0
Note how body rules account for 59.8% of the runtime.
I've been looking at various ways to speed those up with the new
perl 5.9.x trie regexps, or Matt's re2xs tool. But neither really
makes a massive difference, because they can't support the entire
perl regexp language, so there are a lot of rules that they can't
speed up.
I think the only way to speed those up is to reduce the size of the
ruleset, and particularly the number of rules using slow regexp code like
/.*/ or /.{0,n}/.
...Talking of which... the slowest rule by quite a long way in the
current trunk ruleset, is TVD_STOCK1... any chance you could take a look
at simplifying it? ;)
--j.