https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6114
--- Comment #11 from Adam Katz <apa...@khopis.com> 2009-07-16 16:13:44 PST --- Regexp::Assemble looks like the more interesting of the two, even if it's easier for me to split() the regexp into pieces and then add() them to the RA object. Sure enough, it was a quick edit (only 8 lines of code, and I think the resulting code is cleaner anyway). My main worry is that the optimization is more for regexp size than for performance. I've also merged TOP10+TOP20+TOP100+TOP200 into TOP200, which makes its definition 2751 characters with a slew of nesting after reduction via Regexp::Assemble, which is a thousand more than when it was just a list of SpamCop's 101-200 top offenders. I'm going to sit on it for a few days before pushing it here just in case it doesn't work well (though it's live on my sa-update channel). Any comments on my conclusions when I said this? > Additionally, recall that I assigned a very small number of points to the > CIDR8 rules as I was fully expecting some FPs. I've even scored them a > little lower just in case, clocking in at 0.6 for TOP_CIDR8 and 0.2 for > CIDR8. Perhaps I'm not reading the score-map right, but 95.77% of the ham > hits scored under 3.999 (84.14% scored under 0.999), so a small bump won't > make a difference. Given the current data, T_KHOP_SC_CIDR8 would only add > points to ONE false positive hit (0.21% of the ham) and even if scored at > 2.0, it would create 23 FPs (4.87% of the 0.8152% of the hams, which is to > say 0.0397% of the ham). Scoring it 1.0 or less wouldn't actually have > added any FPs. -- Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug.