Kevin A. McGrail writes: > > hey -- anyone think we should consider getting 3.2.0 out before January? I > > think it may be doable. > > > > The one major feature I want to get in is the re2c/sa-compile speedup code > > in the side branch -- it provides about a 20% speedup of scanning by > > compiling parts of the ruleset into native code, which is nice. ;) > > I would like to see it be released before January. The 20% speedup sounds > amazing especially because I see more and more rules each day. Is there any > reduced RAM usage as well? I assume there is 20% less CPU usage just > because it finishes quicker.
Yep, CPU time goes down by a similar amount (SpamAssassin is generally CPU-bound). However I don't think it really helps RAM usage; it probably increases it a little, unfortunately. I agree reducing RAM usage is important though, esp nowadays that the RAM-to-CPU bandwidth is becoming even more of a bottleneck than CPU time... need to look into this more. Here are some timings, btw. I tested it on a couple of weeks of my corpus -- 3395 hams and 15795 spams -- using perl 5.8.8, mass-check, and the latest SVN trunk ruleset including sandbox rules. Without rule2xs active: real avg=2037.131s min=2032.047s max=2045.501s count=3 user avg=1884.417s min=1881.802s max=1887.930s count=3 sys avg=29.990s min=28.354s max=31.446s count=3 that's (19190 / 2037.131) = 9.42 messages/sec. With the compiled ruleset: real avg=1781.106s min=1769.190s max=1797.974s count=4 user avg=1637.173s min=1633.754s max=1640.727s count=4 sys avg=27.706s min=22.197s max=31.578s count=4 = 10.77 messages/sec, about a 14% speedup. (It varies depending on what rules are loaded and what mail is scanned, btw, hence 14 != 20.) > On a similar topic, perhaps, I have been contemplating if the compilation to > native code could do something to not require ?: on every () regexp. I > find that A) I'm lazy on adding them and B) they can get insane on trying to > read and debug some of the more complex rules. Yeah -- it'll do this automatically. However it's an optional plugin, and most people will probably not be using it -- so it can't be counted on being loaded :( for what it's worth, we should be extending --lint to warn about these-- that would make it pretty clear when it needs to be fixed I think. > I've been talking with Mark Damrose about this and since you have to use \\1 > \\2, for the replacements, could the "re2c/sa-compile" be changed to > additionally automatically add ?: to regexp without \\1, etc.? This should > save a little on RAM and overhead, though I'm not sure how much really. hmm, unfortunately \1 and so on are too advanced for the rule2xs compiler; it'll leave those rules as non-compiled body rules. Unfortunately re2c isn't up to the full perl regexp vocabulary -- despite the sterling work that Matt Sergeant has done in writing the compiler code to translate much of it, there's still a lot of flexibility in perl's regexps that don't translate to the re2c model (something to do with DFAs vs NFAs I think ;) (oh yeah -- credit where due -- Matt is the guy who wrote much of this, esp the rule2xs code which translates perl regexps into re2c in the form of a perl XS module. My hacking is mostly glue ;) --j.
