http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5376
------- Additional Comments From [EMAIL PROTECTED] 2007-07-06 04:58 ------- (In reply to comment #8) > Re: Bayes rules. > No, they should not be immutable. If you want, we can require them to be > "sane" > for some definition of sane. There's no compelling reason for them to be the > exact values they are. Possibly not the exact values they are right now. But I think we have to disagree that they need to be immutable; I really would prefer that they are. Other developers' thoughts would be welcome here... > Re: Specifying Score ranges > This is pretty tricky, I'm not sure what we can do here. The problem is that > score ranges are inherently non-mathematical, really more of a shot in the > dark, > and there's no real way to evaluate them. Having a different corpus form the > test set is probably a better real-world test of the algorithm, but it also > adds > a whole lot of luck (I think). If we were evaluating *algorithms* rather than > submitted sets of scores, we could try something like a cross-fold validation > but split into folds based on corpus as you suggest. I don't know if it would > work in practice (or even in theory for that matter). maybe that would be a good additional test step. I agree 10fcv is really the best way to check out an algorithm's workability... > Re: Evaluation > I'm not entirely sure what you're trying to say. Specifying a max FP rate and > minimizing FNs given that rate is not the same as minimizing TCR. TCR builds > in > a relative cost of FPs and FNs (namely lambda), and is probably a simpler > criteria. I don't think we'll see really high FPs if we are trying to obtain > optimal TCR with lambda = 50. > > (Remember that in terms of TCR(lambda=50), a score set with FP = 0.4% and FN = > 0% is equivalent to FP = 0% and FN = 20% (assuming 50-50 ham/spam mix).) ah, here's another problem with TCR I'd forgotten about. it varies based on the size of the corpora: : jm 372...; masses/fp-fn-to-tcr -lambda 50 -fn 5 -fp 0.5 -spam 1000 -ham 1000 # TCR(l=50): 33.500000 : jm 373...; masses/fp-fn-to-tcr -lambda 50 -fn 5 -fp 0.5 -spam 2000 -ham 2000 # TCR(l=50): 66.833333 : jm 374...; masses/fp-fn-to-tcr -lambda 50 -fn 5 -fp 0.5 -spam 3000 -ham 3000 # TCR(l=50): 100.166667 (I can't believe I forgot about that!) I think we should avoid it in general; if we can't compare results because the corpus changes size, that's a bad thing. I know it's nice to have a single figure, but I haven't found a good one yet; nothing that's as comprehensible or easy to use as FP%/FN%. > Re: Fire and Forget > In development of these algorithms, it's much easier to get a process down > then > automate it later. If we don't want to see/evaluate the results before > requiring > people to have a fully automated system, we might be wasting effort. ok, maybe so. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
