On Fri, 18 Jul 2003 17:26:44 -0400, Frank E Harrell Jr wrote:

> The choice of a single cutpoint presents a host of statistical and
> subject matter deficiencies, but if you really need one (which implies
> that your internal utility function is the same as the consumers') you
> are right that you need to make the cross-validation take the cutpoint
> search into account.  The bootstrap is probably the best approach.  Have
> an algorithm for choosing the "best" cutpoint and repeat that algorithm
> 200 times by replacing the original dataset with samples with
> replacement from the original (using the same total number of
> observations).  You can get a confidence interval for the cutpoint this
> way.

Thanks for your reply.

If I could choose a single cutpoint, I'd certainly do that. No need in
creating an ROC in that case. However, I do not know the utility function.
I am merely trying to find a way to create a 'fair' ROC. By that, I mean
an ROC that gives an estimate of the generalization error that is not
flawed by the bias due to selecting the optimal cutpoint.

Maybe some clarification of this bias: Suppose we have a very simple
classification algorithm: f(x)=x. Now, we create an ROC based on the
cut-off value (f(x)>C is class A, f(x)<C is class B). How is, for each
value of C, the point in the ROC determined? I'd say any validation is
unnecessary in this case (we didn't estimate any parameter other than the
cut-off value). However, if we draw the ROC, pick a cut-off value and test
the algorithm on a different data set, the error is likely to be larger
than the one in the ROC.

Regards,
Koen
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to