On Sun, 02 Feb 2003 00:48:40 -0800, Greg Heath wrote: > "Koen Vermeer" <[EMAIL PROTECTED]> wrote in message > news:<[EMAIL PROTECTED]>... > >> I have an experiment in which I have to design some algorithm that >> detects something. My question is related to the evaluation of the >> performance of the algorithm. Normally, I do this with the true >> positive rate (TPR) and the false positive rate (TPR) (or >> sensitivity/specificity), or a full ROC. However, in this case, I am a >> bit worried if my method is right. > You're measure of performance is not clear.
A full ROC would be the best thing. However, I cannot do that at this stage, since that would require an expert to produce a ground truth. We don't have the funds to do that right now. > Typically, an algorithm is designed to minimize or maximize an objective > function that reflects a weighted (or constrained) balance of a positive > class rate (i.e., either TPR (detection) or FNR (leakage)) and a > negative class rate (e.g., FPR (false alarm)). The results I obtain will, hopefully, not be the final results. More a kind of preliminary results, that shows that the performance is good enough to further invest in the method. > The algorithm is then evaluated by the value of the objective function > obtained on an independent test data set whose specific characteristics > were in no way used to design the algorithm. I didn't incorporate bootstrapping or cross-validation in my message, that's true. I am planning on doing that, though. It was just that I was wondering whether or not my method was acceptable: Using two sets (from different populations), one to set the FPR and one to determine the corresponding TPR. > N_1 is used for calibration (e.g., to set false alarm thresholds). So > you have a Neyman-Pearson type objective: minimize the FNR when the FPR > is fixed. [...] > They don't appear to be. N_1, used to calibrate the algorithm, is not > used to estimate the rates used for performance evaluation. Not completely true. If I report my TPR (directly measuring FNR is not really an option, since that would require the expert to look at a lot more data), everybody will ask what the corresponding FPR is. I can fix my FPR and report that, but ofcourse, in practice, the FPR would be different from the one I set it to. I could, ofcourse, use cross-validation to estimate the true FPR. In fact, I have done that, and, fortunately, the 'true' FPR seems to be very close to the value it was set to. (set FPR is 0.015, true FPR is 0.017 or something like that) >> But I feel they are less >> biased then when I would have a ground truth, test my algorithm, with >> varying parameters, on it and select the optimal TPR/FPR-set. Is there >> some literature on this approach? Or is there anything I am >> overlooking? > Just don't evaluate with calibration data. You are right, ofcourse. But that was not the intention of my message. I should have stated that I was planning on doing cross-validation or bootstrapping or something like that. Regards, Koen . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
