Here is my simple solution. It's not as reusable as Jan-Pieter's but I think it highlights the simplicity of the core solution on the line with "K=.". It prints 94.4% accuracy, which matches the blog post.
I'm using the CSV parser is from Zach Elliott ( http://www.jsoftware.com/pipermail/programming/2012-October/029686.html) mj =. 256 $ 0 NB. X - Other mj =. 1 (a.i.a.)}mj NB. C - Char mj =. 2 (a.i.',')}mj NB. D - Delim mj =. 3 (a.i.'"')}mj NB. Q - Quote mj =. 4 (a.i.'''')}mj NB. S - Single Quote sj =. _2] \"1 }.".;._2 (0 : 0) NB. X C D Q S 0 0 1 1 2 1 3 1 4 1 NB. 0 - Other 0 0 1 0 2 2 1 0 1 0 NB. 1 - Char 0 0 1 1 1 2 3 1 4 1 NB. 2 - Delim 0 0 3 0 3 0 5 0 3 0 NB. 3 - Quote 0 0 4 0 4 0 4 0 6 0 NB. 4 - SQuote 0 0 0 0 2 2 3 0 0 0 NB. 5 - Second Quote 0 0 0 0 2 2 0 0 4 0 NB. 6 - Second SQuote ) parse_csv =. (0;sj;mj)&;: read_data=: [:({."1;}."1)@}. 0&do S:0 @: parse_csv ;._2 @: (1!:1) @ < NB. --------------------------------------------------------- 'TN T'=. read_data jpath'~home/trainingsample.csv' 'SN S'=. read_data jpath'~home/validationsample.csv' distance=. [:+/*:@:- imin=. i.<./ K=. S ([:imin distance"1)"1 _ T M=. SN ,. K { TN ]P=: (# %~ [:+/=/"1) M NB. Accuracy On Wed, Jun 11, 2014 at 6:33 AM, Joe Bogner <[email protected]> wrote: > Thank you. For an unscientific, rough benchmark, it runs in 12.4 seconds on > my machine vs the F# version which runs in 44 seconds. I was surprised. In > other unrelated use cases, I was finding .NET to be faster than J. This an > example where J really shines. Your code is compact and readable too. I > will study it and may try my own implementation. Thanks for the link to the > PDF as well. > > > On Wed, Jun 11, 2014 at 2:22 AM, Jan-Pieter Jacobs < > [email protected]> wrote: > > > 2014-06-11 4:07 GMT+02:00 Joe Bogner <[email protected]>: > > > Thanks Jan-Pieter, how would I recreate the results of the calculating > > the > > > % correct with yours? I will give it a shot on my own still later.. I > > > pasted some code to help jumpstart the reading of the array of data: > > > > > > > Thanks for the info! > > > > I just tried the classification of the data and this is what I get: > > > > NB. transformed your loader into a reusable verb. > > parsefile =: 3 : 0 > > file =. fread y > > header_end =. >: file i. LF > > arr =. ". ];._2 header_end }. file > > ) > > > > NB. Load training and validation labels and data > > Train =: parsefile jpath '~temp/trainingsample.csv' > > Validation =: parsefile jpath '~temp/validationsample.csv' > > > > NB. separate labels (1st column) from data (the rest) > > 'TrainLabels TrainData' =: ({."1 ; }."1) Train > > 'ValidationLabels ValidationData'=: ({."1 ; }."1) Validation > > > > NB. Classify one against all: > > predicted =: 10 nnClass oaa TrainLabels;TrainData;ValidationData > > > > NB. Assess the accuracy of our result: > > OA =: 100 * (+/%#)@:= > > > > predicted OA ValidationLabels > > 93.6 > > > > I'd like to recommend the book that started me on implementing this all: > > Elements of Statistical Learning > > Trevor Hastie, Robert Tibshirani, Jerome Friedman > > PDF Freely (legally too) available via > > http://statweb.stanford.edu/~tibs/ElemStatLearn/ > > > > In the future, I'd be interested toying around with more advanced > > classifiers, like Support Vector Machines. > > > > Jan-Pieter > > ---------------------------------------------------------------------- > > For information about J forums see http://www.jsoftware.com/forums.htm > > > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
