Hi, I am having conceptual problems with cross-validating an ROC. The thing is that for me the only reason to draw an ROC is to show the individual fpr/tpr pairs, so one can choose the optimal setting for a specific application (depending on prevalence, cost of FP/FN, etc). So, in fact, the ROC just shows various algorithms, and you choose one that suits you best. The thing with validation is that it is supposed to be done on the final algorithm, not on some intermediate result.
More detailed: Consider algorithm A. It tests a number of algorithms (1..N) and chooses the best one (say number i). Even if algorithm A uses cross-validation to train and test all N algorithms, we cannot say that the error rate of algorithm A is the same as the estimated error rate of algorithm i. So, we cross-validate algorithm A: We use a data set to train it (and thus to select i) and an independent set to test its performance. Now, if we compare this to the ROC, the ROC is like the outcome of all N algorithms. Based on the application, one would choose the best algorithm. Cross-validation is therefore not possible before this selection has been made. On the other hand, one could ofcourse 'cross-validate' the ROC. For example, the ROCs of the several folds could be averaged in some way, or the individual tpr/fpr pairs could be cross-validated. I would appreciate any comments on this! Regards, Koen Vermeer . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
