Can anyone help with this please?
---
I have a set of N images. I train a classifier to label pixels in an image
as one of a set of classes. To estimate the accuracy of the classifier I use
cross-validation with k folds, training on k-1 and testing on 1. Thus the
estimated accuracy on an image is
mu = mean(mean[i], i=1..k)
where mean[i] is the mean accuracy across the images in fold i
I also want to know how much the accuracy varies from one image to another.
I can think of two ways of estimating this:
(a) sigma^2 = mean(var[i], i=1..k)
where var[i] is the variance of the accuracy across the images in fold i
or
(b) sigma^2 = var(mean[i], i=1..k) * n
where n is the number of images in each of the folds.
---
An example:
fold mean var
1 91.43 36.2404
2 89.05 58.3696
3 97.39 3.3856
4 89.38 78.1456
5 91.09 104.858
6 88.49 87.4225
7 86.59 148.596
8 90.36 97.8121
9 86.05 77.6161
10 88.98 125.44
n = 8 (fold size)
mu = 89.881
sigma^2 by (a) = 81.7886 (sigma = 9.0437)
simga^2 by (b) = 71.7367 (sigma = 8.4698)
---
Which estimate is better, or are both incorrect? I appreciate that the fold
size (8) and number of folds (10) are small. Is there a better way? Is there
any way to establish a confidence interval on the estimate?
Thanks
Mark
________________________________________________________________________
Mark Everingham Phone: +44 117 9545249
Room 1.15 Fax: +44 117 9545208
Merchant Venturers Building Email: [EMAIL PROTECTED]
University of Bristol WWW: http://www.cs.bris.ac.uk/~everingm/
Bristol BS8 1UB, UK
=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
http://jse.stat.ncsu.edu/
=================================================================