http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4691





------- Additional Comments From [EMAIL PROTECTED]  2005-11-18 01:24 -------
i have yet to do performance testing on this, but i plan to.

the performance testing i did previously, was when i implemented a 'tflag fast' 
method, which basically just took a config value for fast_body_max_bytes and 
fast_rawbody_max_bytes, did a substr(content,0,<max_bytes>) and performed a 
single regexp test against that block of content, instead of per line regexp. 

the speed up was quite good depending on the max_bytes values that were set.  
even when the max_bytes for body and rawbody were set to 256kb, it was slightly 
faster to use a single regexp test.  i'll just paste those results here..  it 
was sorta crude testing, but gave me some hope to continue. :)

* no tflags fast on body/rawbody rules
---------------------------------------------------------------
total scantime: 570.591863 seconds
number of files scanned: 740
avg scantime: 0.7707657 seconds
---------------------------------------------------------------

And now with 'tflag fast' set on all body and rawbody rules....

* fast_body_max_bytes    262144
* fast_rawbody_max_bytes 262144
---------------------------------------------------------------
total scantime: 552.70254 seconds
number of files scanned: 740
avg scantime: 0.7465846 seconds
---------------------------------------------------------------

* fast_body_max_bytes     8192 
* fast_rawbody_max_bytes 32768
---------------------------------------------------------------
total scantime: 470.123985 seconds
number of files scanned: 740
avg scantime: 0.6349819 seconds
---------------------------------------------------------------
 
* fast_body_max_bytes     4096
* fast_rawbody_max_bytes 16384
---------------------------------------------------------------
total scantime: 454.957886 seconds
number of files scanned: 740
avg scantime: 0.6145058 seconds
---------------------------------------------------------------

as you can see, a 4k body and 16kb rawbody resulted in about a 20% speedup... 
but this was against the entire ruleset running in tflag fast mode.  in 
reality, thats not something that would ever be done because of loss in hit 
rates, but i think the point here is that it even proves faster at the 256kb 
level.  although i'm sure there are people that run spamc -s  with values 
larger than 256kb, and performance at some point will get worse over that.

so, to justin's point, the increase in size of the function by rule does effect 
the memory footprint for one... hopefully it doesnt effect the speed.  and that 
footprint will grow more with the addition of custom rules/rule sets.

memory usage, stock svn

7658 59.0  1.7 22692 18480 ?       Ss   17:39   0:00 /usr/bin/spamd -d

memory usage, svn + tmethod patch

7593  0.4  1.8 24556 18956 ?       Ss   17:37   0:00 /usr/bin/spamd -d

to duncans point, if you "dont use any tmethod rules", then i guess there is no 
point in adding this to the core.  unless the point is to extend the 
functionality for rule writers.  Actually, extending rule writing functionality 
for SARE is the main reason i'm doing this.. most notably the multiline rawbody 
rules which are impossible to write efficiently now since full /s testing is 
just too expensive.

dallas




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to