https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400
--- Comment #8 from Darxus <[email protected]> 2011-11-28 19:39:46 UTC --- (In reply to comment #7) > I'd recommend the entire MSPIKE kit and kaboodle. I'm running with these > scores and recommend them: Really? The ranks of the _WL rule and its components are kind of bad. And I'm concerned that including all the components of the _BL rule will cause the rescorer to behave suboptimally with our relatively limited corpora. Huh, although effectively it looks like it's just two components, _L4 and L5, since the other two are empty, so that's probably fine. But if I did have a vote I certainly wouldn't vote against using the score set you recommended. I'm just curious what you're reasoning is. MSECS SPAM% HAM% S/O RANK SCORE NAME WHO/AGE 0 74.6530 0.0067 1.000 0.99 0.00 T_RCVD_IN_MSPIKE_BL 0 0.0251 7.0172 0.004 0.83 0.00 T_RCVD_IN_MSPIKE_WL 0 0.8738 0 1.000 0.79 0.00 T_RCVD_IN_MSPIKE_ZBI Components of _BL: 0 0 0 0.500 0.48 0.00 T_RCVD_IN_MSPIKE_L2 0 0 0 0.500 0.48 0.00 T_RCVD_IN_MSPIKE_L3 0 48.1830 0.0059 1.000 0.98 0.00 T_RCVD_IN_MSPIKE_L4 0 25.5962 0.0007 1.000 0.97 0.00 T_RCVD_IN_MSPIKE_L5 Components of _WL: 0 0.1684 13.3764 0.012 0.72 0.00 T_RCVD_IN_MSPIKE_H2 0 0.0241 6.9795 0.003 0.84 0.00 T_RCVD_IN_MSPIKE_H3 0 0.0010 0.0355 0.029 0.50 0.00 T_RCVD_IN_MSPIKE_H4 0 0 0.0022 0.000 0.48 0.00 T_RCVD_IN_MSPIKE_H5 Somebody should create a graph, with number of randomly sampled emails from the corpora on one axis, and accuracy rate on the other axis. Get some actual numbers related to how much email we need for what accuracy. -- Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug.
