Also interesting is taking a look which instances the classifier confuses:

NB. taking a look at the trainingdata
load'viewmat'
gray =: (,.(,.,.)) i. 256
gv =: (gray) viewmat ] NB. no colors please

NB. shape of the images (assuming square ones)
shape =: 2$%:{:$TrainData

NB. get means for each label
meanPerLabel =: TrainLabels (+/%#)/. TrainData

NB. keep the order (nub order)
labOrder =: ~. TrainLabels

NB. convert time into a set of stacked images
imPerLabel =: (shape&$)"1 meanPerLabel

NB. Now lets look at classifications gone wrong...
NB. Wrong data
wrongDat   =: (predicted ~: ValidationLabels) # ValidationData

NB. wrong labels
wrongLab   =: (predicted ~: ValidationLabels) # predicted

NB. True labels
correctLab =: (predicted ~: ValidationLabels) # ValidationLabels

NB. Label means going with the wrong predictions
wrongMeans   =: imPerLabel {~ labOrder i. wrongLab

NB. Label means going with the true labels
correctMeans =: imPerLabel {~ labOrder i. correctLab

NB. Scale images between 0 and 256 for equal treatment of images
scale =: <.@([ * (% >./@,"2)@(- <./@,"2)@])

NB. Make lists of images
wrongIms       =: 256 scale (,./ (shape&$)"1 wrongDat)
wrongMeanIms   =: 256 scale (,./ wrongMeans)
CorrectMeanIms =: 256 scale (,./ correctMeans)

NB. Display them as rows below each other
NB.   the misclassified inputs,
NB.   the label mean for the predicted label
NB.   the label mean of the true label
gv wrongIms , wrongMeanIms , CorrectMeanIms

NB. helper to convert RGB channels (1st dimension) to ints
RGBtoInt =: 256&#.&.|:

NB. less confusing, look at them separately:
NB.   Red only in input, green only in class mean, Yellow: in both.
'rgb' viewmat RGBtoInt wrongIms , wrongMeanIms   ,: 0
'rgb' viewmat RGBtoInt wrongIms , CorrectMeanIms ,: 0

NB. Look at overlaps:
'rgb' viewmat RGBtoInt wrongIms , wrongMeanIms   ,: CorrectMeanIms

In that last instance of viewmat, these images are superimposed in each channel:
Red:       only in the input
    Yellow: bad overlap >  misclassification
Green:     only in the predicted label mean
    Cyan:  confusing overlap
Blue       only in the true label mean
    Magenta: good overlap

White/gray : equally distributed in each version.

Maybe not useful, but pretty nonetheless.
This shows also some issues with the data too: badly aligned, scaled,
rotated characters; extra bits and pieces not belonging to the
character itself, ...

Jan-Pieter.

PS: recently I found a neat trick for having monospace fonts in firefox:
right click anywhere in the page, and click "inspect element". The in
the CSS (right column), find an instance of font-family, and add
"mono" to the beginning of the list. This will update the style sheet
instantly, and display nicely. And it will stay around till you
refresh (?). Taking "mono" away again restores normality.

2014-06-11 8:22 GMT+02:00 Jan-Pieter Jacobs <[email protected]>:
> 2014-06-11 4:07 GMT+02:00 Joe Bogner <[email protected]>:
>> Thanks Jan-Pieter, how would I recreate the results of the calculating the
>> % correct with yours? I will give it a shot on my own still later.. I
>> pasted some code to help jumpstart the reading of the array of data:
>>
>
> Thanks for the info!
>
> I just tried the classification of the data and this is what I get:
>
> NB. transformed your loader into a reusable verb.
> parsefile =: 3 : 0
> file =. fread y
> header_end =. >: file i. LF
> arr =. ". ];._2 header_end }. file
> )
>
> NB. Load training and validation labels and data
> Train      =: parsefile jpath '~temp/trainingsample.csv'
> Validation =: parsefile jpath '~temp/validationsample.csv'
>
> NB. separate labels (1st column) from data (the rest)
> 'TrainLabels TrainData'          =: ({."1 ; }."1) Train
> 'ValidationLabels ValidationData'=: ({."1 ; }."1) Validation
>
> NB. Classify one against all:
> predicted =: 10 nnClass oaa TrainLabels;TrainData;ValidationData
>
> NB. Assess the accuracy of our result:
> OA =: 100 * (+/%#)@:=
>
> predicted OA ValidationLabels
> 93.6
>
> I'd like to recommend the book that started me on implementing this all:
> Elements of Statistical Learning
> Trevor Hastie, Robert Tibshirani, Jerome Friedman
> PDF Freely (legally too) available via
> http://statweb.stanford.edu/~tibs/ElemStatLearn/
>
> In the future, I'd be interested toying around with more advanced
> classifiers, like Support Vector Machines.
>
> Jan-Pieter
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to