Here is my simple solution. It's not as reusable as Jan-Pieter's but I
think it highlights the simplicity of the core solution on the line with
"K=.". It prints 94.4% accuracy, which matches the blog post.

I'm using the CSV parser is from Zach Elliott (
http://www.jsoftware.com/pipermail/programming/2012-October/029686.html)

mj =. 256 $ 0 NB. X - Other

mj =. 1 (a.i.a.)}mj NB. C - Char

mj =. 2 (a.i.',')}mj NB. D - Delim

mj =. 3 (a.i.'"')}mj NB. Q - Quote

mj =. 4 (a.i.'''')}mj NB. S - Single Quote



sj =. _2] \"1 }.".;._2 (0 : 0)

NB. X C D Q S

0 0 1 1 2 1 3 1 4 1 NB. 0 - Other

0 0 1 0 2 2 1 0 1 0 NB. 1 - Char

0 0 1 1 1 2 3 1 4 1 NB. 2 - Delim

0 0 3 0 3 0 5 0 3 0 NB. 3 - Quote

0 0 4 0 4 0 4 0 6 0 NB. 4 - SQuote

0 0 0 0 2 2 3 0 0 0 NB. 5 - Second Quote

0 0 0 0 2 2 0 0 4 0 NB. 6 - Second SQuote

)


parse_csv =. (0;sj;mj)&;:


read_data=: [:({."1;}."1)@}. 0&do S:0 @: parse_csv ;._2 @: (1!:1) @ <


NB. ---------------------------------------------------------


'TN T'=. read_data jpath'~home/trainingsample.csv'

'SN S'=. read_data jpath'~home/validationsample.csv'


distance=. [:+/*:@:-

imin=. i.<./


K=. S ([:imin distance"1)"1 _ T


M=. SN ,. K { TN

]P=: (# %~ [:+/=/"1) M NB. Accuracy




On Wed, Jun 11, 2014 at 6:33 AM, Joe Bogner <[email protected]> wrote:

> Thank you. For an unscientific, rough benchmark, it runs in 12.4 seconds on
> my machine vs the F# version which runs in 44 seconds. I was surprised. In
> other unrelated use cases, I was finding .NET to be faster than J. This an
> example where J really shines.   Your code is compact and readable too. I
> will study it and may try my own implementation. Thanks for the link to the
> PDF as well.
>
>
> On Wed, Jun 11, 2014 at 2:22 AM, Jan-Pieter Jacobs <
> [email protected]> wrote:
>
> > 2014-06-11 4:07 GMT+02:00 Joe Bogner <[email protected]>:
> > > Thanks Jan-Pieter, how would I recreate the results of the calculating
> > the
> > > % correct with yours? I will give it a shot on my own still later.. I
> > > pasted some code to help jumpstart the reading of the array of data:
> > >
> >
> > Thanks for the info!
> >
> > I just tried the classification of the data and this is what I get:
> >
> > NB. transformed your loader into a reusable verb.
> > parsefile =: 3 : 0
> > file =. fread y
> > header_end =. >: file i. LF
> > arr =. ". ];._2 header_end }. file
> > )
> >
> > NB. Load training and validation labels and data
> > Train      =: parsefile jpath '~temp/trainingsample.csv'
> > Validation =: parsefile jpath '~temp/validationsample.csv'
> >
> > NB. separate labels (1st column) from data (the rest)
> > 'TrainLabels TrainData'          =: ({."1 ; }."1) Train
> > 'ValidationLabels ValidationData'=: ({."1 ; }."1) Validation
> >
> > NB. Classify one against all:
> > predicted =: 10 nnClass oaa TrainLabels;TrainData;ValidationData
> >
> > NB. Assess the accuracy of our result:
> > OA =: 100 * (+/%#)@:=
> >
> > predicted OA ValidationLabels
> > 93.6
> >
> > I'd like to recommend the book that started me on implementing this all:
> > Elements of Statistical Learning
> > Trevor Hastie, Robert Tibshirani, Jerome Friedman
> > PDF Freely (legally too) available via
> > http://statweb.stanford.edu/~tibs/ElemStatLearn/
> >
> > In the future, I'd be interested toying around with more advanced
> > classifiers, like Support Vector Machines.
> >
> > Jan-Pieter
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> >
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to