Hi,

I have an experiment in which I have to design some algorithm that detects
something. My question is related to the evaluation of the performance of
the algorithm. Normally, I do this with the true positive rate (TPR) and
the false positive rate (TPR) (or sensitivity/specificity), or a full ROC.
However, in this case, I am a bit worried if my method is right.

I have two datasets. One of them contains only negative samples (that is,
samples that do not contain the something I am looking for). The other one
contains both positive and negative samples. The number of samples in the
first set is N_1, the number of samples in the second set is N_2. I cannot
let an expert take a look on set 2 to classify all samples. I could,
however, let an expert take a look on a small subset of set 2.

The method I am trying to apply is to start with the negative sample set.
I run my algorithm on this set, and count the true negatives (TN_1) and
false positives (FP_1). By changing my parameters, I can select a
TN_1/FP_1 pair. With this data, I can calculate the FPR. I select the FPR
I like, and use the parameters associated with it for my algorithm. Now I
run my algorithm, with these parameters, on set 2. I get a number of
positive responses, which is the sum of FP_2 and TP_2. I can let an expert
take a look on these and decide which ones are FPs and which are TPs. With
my specified FPR, I can calculate TN_2 from FP_2. With N_2, I can now
calculate FN_2 and the TPR.

I am certain my results will be a bit biased. But I feel they are less
biased then when I would have a ground truth, test my algorithm, with
varying parameters, on it and select the optimal TPR/FPR-set. Is there
some literature on this approach? Or is there anything I am overlooking?

Regards,
Koen Vermeer
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to