https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6114





--- Comment #11 from Adam Katz <apa...@khopis.com>  2009-07-16 16:13:44 PST ---
Regexp::Assemble looks like the more interesting of the two, even if it's
easier for me to split() the regexp into pieces and then add() them to the RA
object.  Sure enough, it was a quick edit (only 8 lines of code, and I think
the resulting code is cleaner anyway).  My main worry is that the optimization
is more for regexp size than for performance.

I've also merged TOP10+TOP20+TOP100+TOP200 into TOP200, which makes its
definition 2751 characters with a slew of nesting after reduction via
Regexp::Assemble, which is a thousand more than when it was just a list of
SpamCop's 101-200 top offenders.

I'm going to sit on it for a few days before pushing it here just in case it
doesn't work well (though it's live on my sa-update channel).

Any comments on my conclusions when I said this?
> Additionally, recall that I assigned a very small number of points to the
> CIDR8 rules as I was fully expecting some FPs.  I've even scored them a
> little lower just in case, clocking in at 0.6 for TOP_CIDR8 and 0.2 for
> CIDR8.  Perhaps I'm not reading the score-map right, but 95.77% of the ham
> hits scored under 3.999 (84.14% scored under 0.999), so a small bump won't
> make a difference.  Given the current data, T_KHOP_SC_CIDR8 would only add
> points to ONE false positive hit (0.21% of the ham) and even if scored at
> 2.0, it would create 23 FPs (4.87% of the 0.8152% of the hams, which is to
> say 0.0397% of the ham).  Scoring it 1.0 or less wouldn't actually have
> added any FPs.

-- 
Configure bugmail: 
https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

Reply via email to