Can anyone help with this please?

---

I have a set of N images. I train a classifier to label pixels in an image
as one of a set of classes. To estimate the accuracy of the classifier I use
cross-validation with k folds, training on k-1 and testing on 1. Thus the
estimated accuracy on an image is

mu = mean(mean[i], i=1..k)

where mean[i] is the mean accuracy across the images in fold i

I also want to know how much the accuracy varies from one image to another.
I can think of two ways of estimating this:

(a) sigma^2 = mean(var[i], i=1..k)

where var[i] is the variance of the accuracy across the images in fold i

or

(b) sigma^2 = var(mean[i], i=1..k) * n

where n is the number of images in each of the folds.

---

An example:

fold  mean   var
   1  91.43  36.2404
   2  89.05  58.3696
   3  97.39  3.3856
   4  89.38  78.1456
   5  91.09  104.858
   6  88.49  87.4225
   7  86.59  148.596
   8  90.36  97.8121
   9  86.05  77.6161
  10  88.98  125.44

n = 8 (fold size)

mu = 89.881
sigma^2 by (a) = 81.7886 (sigma = 9.0437)
simga^2 by (b) = 71.7367 (sigma = 8.4698)

---

Which estimate is better, or are both incorrect? I appreciate that the fold
size (8) and number of folds (10) are small. Is there a better way? Is there
any way to establish a confidence interval on the estimate?

Thanks
Mark

________________________________________________________________________

Mark Everingham               Phone: +44 117 9545249
Room 1.15                     Fax:   +44 117 9545208
Merchant Venturers Building   Email: [EMAIL PROTECTED]
University of Bristol         WWW:   http://www.cs.bris.ac.uk/~everingm/
Bristol BS8 1UB, UK






=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to