"Koen Vermeer" <[EMAIL PROTECTED]> wrote in message 
news:<[EMAIL PROTECTED]>...
 
> I have an experiment in which I have to design some algorithm that detects
> something. My question is related to the evaluation of the performance of
> the algorithm. Normally, I do this with the true positive rate (TPR) and
> the false positive rate (TPR) (or sensitivity/specificity), or a full ROC.
> However, in this case, I am a bit worried if my method is right.

You're measure of performance is not clear. 

Typically, an algorithm is designed to minimize or maximize
an objective function that reflects a weighted (or constrained) 
balance of a positive class rate (i.e., either TPR (detection) 
or FNR (leakage)) and a negative class rate (e.g., FPR (false 
alarm)).

The algorithm is then evaluated by the value of the objective 
function obtained on an independent test data set whose specific
characteristics were in no way used to design the algorithm.
 
> I have two datasets. One of them contains only negative samples (that is,
> samples that do not contain the something I am looking for). The other one
> contains both positive and negative samples. The number of samples in the
> first set is N_1, the number of samples in the second set is N_2. I cannot
> let an expert take a look on set 2 to classify all samples. I could,
> however, let an expert take a look on a small subset of set 2.
> 
> The method I am trying to apply is to start with the negative sample set.
> I run my algorithm on this set, and count the true negatives (TN_1) and
> false positives (FP_1). By changing my parameters, I can select a
> TN_1/FP_1 pair. With this data, I can calculate the FPR. I select the FPR
> I like, and use the parameters associated with it for my algorithm. 

N_1 is used for calibration (e.g., to set false alarm thresholds).
So you have a Neyman-Pearson type objective: minimize the FNR 
when the FPR is fixed.

> Now I
> run my algorithm, with these parameters, on set 2. I get a number of
> positive responses, which is the sum of FP_2 and TP_2. I can let an expert
> take a look on these and decide which ones are FPs and which are TPs. With
> my specified FPR, I can calculate TN_2 from FP_2. With N_2, I can now
> calculate FN_2 and the TPR.
> 
> I am certain my results will be a bit biased. 

They don't appear to be. N_1, used to calibrate the algorithm, is not 
used to estimate the rates used for performance evaluation.

> But I feel they are less
> biased then when I would have a ground truth, test my algorithm, with
> varying parameters, on it and select the optimal TPR/FPR-set. Is there
> some literature on this approach? Or is there anything I am overlooking?

Just don't evaluate with calibration data.

Hope this helps.

Greg
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to