Re: testing if classifier accuracy differs significantly

Donald Burrill Sat, 19 Aug 2000 15:43:40 -0700
�S�̽b�nFO ��\(�+�3Eb+��4tTwo different strategies occur to mind, both of
which � ���\��ght I suppose be implemented severally: 
Garray of pixels, Ӽ�m�o�[��3s�so that each pixel may be thought of as at 
the intersection of a row and }c���`���MЬ�)�HHW�_�3(U
a column of pixels.  There are then the individual pixels (all RxC of 
them) and three different aggregations: by row, by column, and by the 
whole image.  Seems to me this would permit an ANOVA-like analysis, using 
for dependent variable some suitable error function between the known 
label for each function and the classifier's label, with sources of 
variation representing rows and columns (in neither of which you would 
have much interest, I imagine), classifiers (whose main effect is 
equivalent to the t-test you mention below), and interactions between 
(classifiers and rows) and (classifiers and columns) (these latter two 
representing different levels of aggregation than the whole image.

  2.  Instead of the structural components (rows & 'n/i�&/j�
!�Ocolumns;}b)  �J
I�k�7;����v�n�i����>=�jq
뺳���6Z%�g&���M���On Sat, 19 Aug 2000, Mark Everingham wrote:

> I have two classifier systems which take as input an image and produce
> as output a label for each pixel in the image, for example the input
> might be of an outdoor scene, and the labels sky/road/tree etc.
> 
> I have a set of images with the correct labels, so I can test how
> accurately a classifier performs by calculating for example the mean
> number of pixels correctly classified per image or the mean number of
> sky pixels correctly classified etc.
> 
> The problem is this: Given *two* different classifiers, I want to test
> if the accuracy achieved by each classifier differs *significantly*. One
> way I can think of doing this is:
> 
> for classifier 1,2
>       for each image
>               get % pixels correct
>       calculate mean and sd across images
> apply t-test
> 
> Because the images used for each classifier are the same, I assume I can
> use a paired t-test. Assuming the distribution of % correct across
> images is approximately normal, this should work fine.
> 
> However, I have two nagging objections to this:
> 
>  i) the accumulation of statistics across *images* rather than any other
> unit is
>     fairly arbitrary
> 
> ii) because the *pixels* in each image are identical as well as the
> images, it
>     seems to me that there may be a stronger statistic I can use, rather
> than
>     just lumping all the pixels of an image together and taking the sum
> of
>     correct pixels. The analogy I am thinking of is comparing
> performance on a pair
>     of exams and looking at individual questions rather than just taking
> the
>     overall number of correct responses.
> 
> Anyone have any comments/ideas?
> 
> Thanks in advance
> Mark
> 
> ________________________________________________________________________
> 
> Mark Everingham               Phone: +44 117 9545249
> Room 1.15                     Fax:   +44 117 9545208
> Merchant Venturers Building   Email: [EMAIL PROTECTED]
> University of Bristol         WWW:   http://www.cs.bris.ac.uk/~everingm/
> Bristol BS8 1UB, UK
> 
> 
> =================================================================
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
>                   http://jse.stat.ncsu.edu/
> =================================================================
> 

 ------------------------------------------------------------------------
 Donald F. Burrill                                 [EMAIL PROTECTED]
 348 Hyde Hall, Plymouth State College,          [EMAIL PROTECTED]
 MSC #29, Plymouth, NH 03264                                 603-535-2597
 184 Nashua Road, Bedford, NH 03110                          603-471-7128  



=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================
Re: testing if classifier accuracy differs significantly

Reply via email to