http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5686
------- Additional Comments From [EMAIL PROTECTED] 2007-10-23 05:08 ------- (In reply to comment #7) > 3. different tokenization so I tried some of this out last night; I took one of the persistent FNs that keeps showing up around the 0.2 mark, and examined the tokens being generated during tokenization. It turned out that some of the OSBF tokenization didn't cope well with some of *our* tokens. 1. The decomposed address tokens, like "UD*jmason.org" for an email addr containing hte domain "taint.org", were being split up into two tokens "UD*" and "jmason.org" -- not useful -- so I fixed that; 2. the "key=value" metadata in the X-Spam-Relays headers was similarly being broken up into "key=", "value". fixed. this is checked in as r587469. here's a histogram: SCORE NUMHIT DETAIL OVERALL HISTOGRAM (. = ham, # = spam) 0.000 (21.949%) ..........|............................................ 0.040 (21.620%) ..........|........................................... 0.080 (27.737%) ..........|....................................................... 0.120 (12.351%) ..........|........................ 0.160 ( 6.993%) ..........|.............. 0.160 ( 0.044%) # | 0.200 ( 4.802%) ..........|.......... 0.200 ( 0.006%) | 0.240 ( 2.656%) ..........|..... 0.280 ( 1.169%) ..........|.. 0.280 ( 0.055%) # | 0.320 ( 0.400%) ..........|. 0.320 ( 0.215%) ##### | 0.360 ( 0.172%) ....... | 0.360 ( 0.287%) ####### | 0.400 ( 0.056%) .. | 0.400 ( 0.287%) ####### | 0.440 ( 0.083%) ## | 0.480 ( 0.096%) .... | 0.480 ( 1.075%) ##########|# 0.520 ( 0.276%) ####### | 0.560 ( 0.573%) ##########|# 0.600 ( 0.843%) ##########|# 0.640 ( 1.725%) ##########|## 0.680 ( 5.545%) ##########|####### 0.720 (20.387%) ##########|######################## 0.760 (46.555%) ##########|####################################################### 0.800 (20.800%) ##########|######################### 0.840 ( 1.141%) ##########|# 0.880 ( 0.017%) | 0.920 ( 0.017%) | 0.960 ( 0.072%) ## | Threshold optimization for hamcutoff=0.30, spamcutoff=0.70: cost=$178.60 Total ham:spam: 19764:18144 FP: 0 0.000% FN: 9 0.050% Unsure: 1696 4.474% (ham: 374 1.892% spam: 1322 7.286%) TCRs: l=1 13.632 l=5 13.632 l=9 13.632 Threshold optimization for hamcutoff=0.30, spamcutoff=0.54: cost=$130.40 Total ham:spam: 19764:18144 FP: 0 0.000% FN: 11 0.061% Unsure: 597 1.575% (ham: 220 1.113% spam: 377 2.078%) TCRs: l=1 46.763 l=5 46.763 l=9 46.763 looking quite a bit better! ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
