Thanks Jan-Pieter, how would I recreate the results of the calculating the
% correct with yours? I will give it a shot on my own still later.. I
pasted some code to help jumpstart the reading of the array of data:

For anyone who's interested, here's a small comparison of different
techniques to load the data:

require 'csv'

NB. from
https://raw.githubusercontent.com/c4fsharp/Dojo-Digits-Recognizer/master/Dojo/trainingsample.csv

PATH =: 'c:/users/jbogner/downloads/trainingsample.csv'

parsefile =: 3 : 0
training_file =: fread PATH
header_end =: >: training_file i. LF
arr =: ". ];._2 header_end }. training_file
)

parsecsv =: 3 : 0
arr =: ". each }. readcsv PATH
)


Note 'timing of fread'
timespacex 'parsefile'''''
3.94344 7.48989e7

$ arr
5000 785

)

Note 'timing of parsecsv'
timespacex 'parsecsv'''''
25.6972 4.01359e9


)


fread was the way to go vs parsecsv since parsecsv needed to convert to
convert each boxed cell to a number (5000x785 matrix)

Since each line is a comma delimited number, ". can read each each line and
parse it without boxing

". '1,2,3,4'

1 2 3 4




On Tue, Jun 10, 2014 at 7:06 PM, Raul Miller <[email protected]> wrote:

> If you are only comparing distances (and not measuring them), you can skip
> the square root and just compare the magnitudes of the sums of the squares.
>
> (Generally speaking, eliminating unnecessary operations is how you make
> things faster - this can be taken to the point of silliness (code golf) but
> if you run timings on representative data sets, that can help build up your
> understanding about where time gets spent.)
>
> That said, I've not taken enough time to study this problem to understand
> the data. (For me, understanding the data is usually harder than
> understanding the code. And, even when the code needs effort to understand,
> I need to understand the data before I can even think about trying to
> understand the code.)
>
> Thanks,
>
> --
> Raul
>
>
>
> On Tue, Jun 10, 2014 at 4:57 PM, Jan-Pieter Jacobs <
> [email protected]> wrote:
>
> > Now that you mention, I wrote an implementation in J doing just that...
> >
> > So here it is ... I'd stop reading when you want to give it a go yourself
> > ...
> >
> > Caution : big spoiler ahead in 10
> >
> >
> >
> > 9
> >
> >
> >
> >
> > 8
> >
> >
> >
> >
> > 7
> >
> >
> >
> >
> > 6
> >
> >
> >
> >
> > 5
> >
> >
> >
> >
> > 4
> >
> >
> >
> >
> > 3
> >
> >
> >
> >
> > 2
> >
> >
> >
> >
> > 1
> >
> > In my implementation, I took instances as rows, and features
> > (variables or whatever) in columns.
> > If known labels Y (shape N x Q) correspond to data X (N x P) , and
> > Xtest (M x P) are to be classified,
> >
> > YTest =: k nnClass Y;X;XTest
> >
> > does the job for binary data. Y can in fact consist of more than one
> > label per training element, and as such YTest has shape M x Q. Due to
> > this fact, it's easy to expand to multiclass problems :
> >
> > YTest =: k nnClass oaa Y;X;XTest
> >
> > My tests point out the bottleneck keeps being the distance calculation...
> > All suggestions for improvements are welcome.
> > Sooner or later I'll put the entire thing on the wiki, including tools
> > for doing random cross-validation.
> >
> > Implementation of this beauty below.
> >
> > NB. Distance between rows, different possibilities:
> > NB. mind the &.: : true Euclidean distance
> > NB. d=: +/&.:*:@(-"1)/
> >
> > NB. D^2 Avoiding squaring
> > NB. d=: +/&:*:@(-"1)/
> >
> > NB. D^2 using integer math for speed
> > d=: +/&:*:@(-"1)/&([: <. (2^20) * ])
> >
> > NB. x nn y
> > NB.   takes x nearest neighbors for the test data,
> > NB.   returning the indices of the NN in the training data.
> > NB.   If discarding the node itself is needed, add >:@ before i.
> > nn =: i.@:[ ({ /:)"1 ({: d&> {.)@]
> >
> > NB. x nnReg y
> > NB.   where x = k neighbors and
> > NB.   y =: (training labels) ; (training data) ; (data to be classified)
> > NB.   k-nn regression is simply taking the average over the label of k
> > neighbors.
> > NB.   Taking the mean over "_1 makes it extend to multidimensional y
> > nnReg =: (+/%#)"_1@((nn }.) { >@{.@])
> >
> > NB. x nnClass y
> > NB.   As before: x = k neighbors and
> > NB.   y =: (training labels) ; (training data) ; (data to be classified)
> > NB.   Going from regression to classification is a matter of taking
> > the highest one.
> > NB.   take the first maximum encountered per row to eliminate any
> doubles.
> > NB.   Note that, depending on the application, probably there are better
> > NB.   ways resolving multiple maxima.
> > nnClass =: ((i.@# = (i. >./))"1)@:nnReg
> >
> >
> > NB. As nnClass and nnReg can handle multidimensional labels Y, extening
> to
> > NB. a one-against-all multi class classifier is trivial:
> > oaa =: 1 : 0
> > :
> > (>{.y) fromOAA x u toOAA y
> > )
> >
> > NB. convert y
> > NB.   y is the argument as for nnClass / nnReg
> > toOAA =: |:@=&.>@{. , }.
> >
> > NB. x deoaa y
> > NB.   x=: original Y
> > NB.   y=: class map as multi-dimensional labels from nnClass
> > NB.  converts from multi-dimensional labels to original ones.
> > fromOAA =: (] #"1 ~.@[)
> >
> > Enjoy!
> >
> > 2014-06-10 21:52 GMT+02:00 Joe Bogner <[email protected]>:
> > > I was reading this article
> > http://huonw.github.io/2014/06/10/knn-rust.html
> > > via hackernews https://news.ycombinator.com/item?id=7872398
> > >
> > > It seems like a good exercise for J. I don't have the time to do it
> over
> > > the next few days but would be very interested in the results. I might
> > get
> > > around to it later in the week if no one else does.
> > >
> > > The data files appear to be here:
> > > https://github.com/c4fsharp/Dojo-Digits-Recognizer/tree/master/Dojo
> > >
> > > I liked how clean the factor version was:
> > > http://re-factor.blogspot.com/2014/06/comparing-k-nn-in-factor.html
> > > ----------------------------------------------------------------------
> > > For information about J forums see http://www.jsoftware.com/forums.htm
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> >
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to