http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4691
------- Additional Comments From [EMAIL PROTECTED] 2005-11-18 01:24 ------- i have yet to do performance testing on this, but i plan to. the performance testing i did previously, was when i implemented a 'tflag fast' method, which basically just took a config value for fast_body_max_bytes and fast_rawbody_max_bytes, did a substr(content,0,<max_bytes>) and performed a single regexp test against that block of content, instead of per line regexp. the speed up was quite good depending on the max_bytes values that were set. even when the max_bytes for body and rawbody were set to 256kb, it was slightly faster to use a single regexp test. i'll just paste those results here.. it was sorta crude testing, but gave me some hope to continue. :) * no tflags fast on body/rawbody rules --------------------------------------------------------------- total scantime: 570.591863 seconds number of files scanned: 740 avg scantime: 0.7707657 seconds --------------------------------------------------------------- And now with 'tflag fast' set on all body and rawbody rules.... * fast_body_max_bytes 262144 * fast_rawbody_max_bytes 262144 --------------------------------------------------------------- total scantime: 552.70254 seconds number of files scanned: 740 avg scantime: 0.7465846 seconds --------------------------------------------------------------- * fast_body_max_bytes 8192 * fast_rawbody_max_bytes 32768 --------------------------------------------------------------- total scantime: 470.123985 seconds number of files scanned: 740 avg scantime: 0.6349819 seconds --------------------------------------------------------------- * fast_body_max_bytes 4096 * fast_rawbody_max_bytes 16384 --------------------------------------------------------------- total scantime: 454.957886 seconds number of files scanned: 740 avg scantime: 0.6145058 seconds --------------------------------------------------------------- as you can see, a 4k body and 16kb rawbody resulted in about a 20% speedup... but this was against the entire ruleset running in tflag fast mode. in reality, thats not something that would ever be done because of loss in hit rates, but i think the point here is that it even proves faster at the 256kb level. although i'm sure there are people that run spamc -s with values larger than 256kb, and performance at some point will get worse over that. so, to justin's point, the increase in size of the function by rule does effect the memory footprint for one... hopefully it doesnt effect the speed. and that footprint will grow more with the addition of custom rules/rule sets. memory usage, stock svn 7658 59.0 1.7 22692 18480 ? Ss 17:39 0:00 /usr/bin/spamd -d memory usage, svn + tmethod patch 7593 0.4 1.8 24556 18956 ? Ss 17:37 0:00 /usr/bin/spamd -d to duncans point, if you "dont use any tmethod rules", then i guess there is no point in adding this to the core. unless the point is to extend the functionality for rule writers. Actually, extending rule writing functionality for SARE is the main reason i'm doing this.. most notably the multiline rawbody rules which are impossible to write efficiently now since full /s testing is just too expensive. dallas ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
