Hi,

I am having conceptual problems with cross-validating an ROC. The thing is
that for me the only reason to draw an ROC is to show the individual
fpr/tpr pairs, so one can choose the optimal setting for a specific
application (depending on prevalence, cost of FP/FN, etc). So, in fact,
the ROC just shows various algorithms, and you choose one that suits you
best. The thing with validation is that it is supposed to be done on the
final algorithm, not on some intermediate result.

More detailed:
Consider algorithm A. It tests a number of algorithms (1..N) and chooses
the best one (say number i). Even if algorithm A uses cross-validation to
train and test all N algorithms, we cannot say that the error rate of
algorithm A is the same as the estimated error rate of algorithm i. So, we
cross-validate algorithm A: We use a data set to train it (and thus to
select i) and an independent set to test its performance.

Now, if we compare this to the ROC, the ROC is like the outcome of all N
algorithms. Based on the application, one would choose the best algorithm.
Cross-validation is therefore not possible before this selection has been
made.

On the other hand, one could ofcourse 'cross-validate' the ROC. For
example, the ROCs of the several folds could be averaged in some way, or
the individual tpr/fpr pairs could be cross-validated.

I would appreciate any comments on this!

Regards,
Koen Vermeer

.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to