Theo Van Dinter writes:
> On Sun, Aug 27, 2006 at 06:01:04PM -0400, Theo Van Dinter wrote:
> > On Sun, Aug 27, 2006 at 10:45:38PM +0100, Justin Mason wrote:
> > > hey dude -- looks very cool.  How many machines have you tried it on
> > > so far?
> > 
> > :)  None, yet.
> [...]
> 
> Ok, the code is generally usable at this point -- I can run the server,
> plus clients on both machines that I have, and the results all come
> out the same.  I need to still clean up the mass-check code a bit and
> document some more, but I think it's beta-ish at the moment. :)
> 
> Some preliminary results with a small set of ~4900 messages:
> 
> Original version (trunk w/ -j2):                      7:42  (5.3/cpu/sec)
> New version in normal mode (no client/server w/ -j2): 7:33  (5.4/cpu/sec)
> Client/Server w/ just remote client w/ -j2:           10:08 (4.0/cpu/sec)
> Client/Server w/ local and remote clients w/ -j2:     5:08  (8.0/cpu/sec)
> 
> I should be able to tweak up the remote speed some more -- this was w/
> the defaults of max 1000 messages per run.  Having a small dataset, and a
> limited number of messages per run causes the overhead to be a larger
> percentage of time than it would be w/ more messages and more messages per
> run.
> 
> 
> BTW: We should figure out why the throughput is so crappy now though.
> As I recall, I was able to push through 10-12 msgs/cpu/sec on earlier
> versions of SA.

I've been working on optimization -- the last two checkins should save about
7.5% of runtime on a typical mass-check.  But overall, trunk is definitely
slower than, let's say, 3.0.x was.  This is simply because there's many more
complex rules -- the ReplaceTags rules in particular are complex, and there's a
lot of body rules now.  Here's how the profile currently looks:

Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 59.8   11.23 11.587    223   0.0504 0.0520  Mail::SpamAssassin::PerMsgStatus::
                                             _body_tests_0
 4.10   0.769  0.866    223   0.0034 0.0039  Mail::SpamAssassin::PerMsgStatus::
                                             _body_uri_tests_0
 3.72   0.699  1.165    223   0.0031 0.0052  Mail::SpamAssassin::PerMsgStatus::
                                             _head_tests_0
 2.47   0.464  0.477   1197   0.0004 0.0004  Mail::SpamAssassin::Message::Metad
                                             ata::parse_received_line
 1.68   0.315  0.469    223   0.0014 0.0021  Mail::SpamAssassin::PerMsgStatus::
                                             _meta_tests_500
 1.54   0.289  0.000    413   0.0007 0.0000  utf8::SWASHNEW
 1.17   0.220  0.237   3754   0.0001 0.0001  Text::Wrap::wrap
 1.12   0.210  0.209    223   0.0009 0.0009  Mail::SpamAssassin::PerMsgStatus::
                                             _rawbody_tests_0


Note how body rules account for 59.8% of the runtime.

I've been looking at various ways to speed those up with the new
perl 5.9.x trie regexps, or Matt's re2xs tool.  But neither really
makes a massive difference, because they can't support the entire
perl regexp language, so there are a lot of rules that they can't
speed up.

I think the only way to speed those up is to reduce the size of the
ruleset, and particularly the number of rules using slow regexp code like
/.*/ or /.{0,n}/.


...Talking of which...  the slowest rule by quite a long way in the
current trunk ruleset, is TVD_STOCK1...   any chance you could take a look
at simplifying it? ;)

--j.

Reply via email to