http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5376
------- Additional Comments From [EMAIL PROTECTED] 2007-07-04 20:18 ------- Re: Bayes rules. No, they should not be immutable. If you want, we can require them to be "sane" for some definition of sane. There's no compelling reason for them to be the exact values they are. Re: Specifying Score ranges This is pretty tricky, I'm not sure what we can do here. The problem is that score ranges are inherently non-mathematical, really more of a shot in the dark, and there's no real way to evaluate them. Having a different corpus form the test set is probably a better real-world test of the algorithm, but it also adds a whole lot of luck (I think). If we were evaluating *algorithms* rather than submitted sets of scores, we could try something like a cross-fold validation but split into folds based on corpus as you suggest. I don't know if it would work in practice (or even in theory for that matter). Re: Evaluation I'm not entirely sure what you're trying to say. Specifying a max FP rate and minimizing FNs given that rate is not the same as minimizing TCR. TCR builds in a relative cost of FPs and FNs (namely lambda), and is probably a simpler criteria. I don't think we'll see really high FPs if we are trying to obtain optimal TCR with lambda = 50. (Remember that in terms of TCR(lambda=50), a score set with FP = 0.4% and FN = 0% is equivalent to FP = 0% and FN = 20% (assuming 50-50 ham/spam mix).) Re: Fire and Forget In development of these algorithms, it's much easier to get a process down then automate it later. If we don't want to see/evaluate the results before requiring people to have a fully automated system, we might be wasting effort. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
