If you are only comparing distances (and not measuring them), you can skip
the square root and just compare the magnitudes of the sums of the squares.

(Generally speaking, eliminating unnecessary operations is how you make
things faster - this can be taken to the point of silliness (code golf) but
if you run timings on representative data sets, that can help build up your
understanding about where time gets spent.)

That said, I've not taken enough time to study this problem to understand
the data. (For me, understanding the data is usually harder than
understanding the code. And, even when the code needs effort to understand,
I need to understand the data before I can even think about trying to
understand the code.)

Thanks,

-- 
Raul



On Tue, Jun 10, 2014 at 4:57 PM, Jan-Pieter Jacobs <
[email protected]> wrote:

> Now that you mention, I wrote an implementation in J doing just that...
>
> So here it is ... I'd stop reading when you want to give it a go yourself
> ...
>
> Caution : big spoiler ahead in 10
>
>
>
> 9
>
>
>
>
> 8
>
>
>
>
> 7
>
>
>
>
> 6
>
>
>
>
> 5
>
>
>
>
> 4
>
>
>
>
> 3
>
>
>
>
> 2
>
>
>
>
> 1
>
> In my implementation, I took instances as rows, and features
> (variables or whatever) in columns.
> If known labels Y (shape N x Q) correspond to data X (N x P) , and
> Xtest (M x P) are to be classified,
>
> YTest =: k nnClass Y;X;XTest
>
> does the job for binary data. Y can in fact consist of more than one
> label per training element, and as such YTest has shape M x Q. Due to
> this fact, it's easy to expand to multiclass problems :
>
> YTest =: k nnClass oaa Y;X;XTest
>
> My tests point out the bottleneck keeps being the distance calculation...
> All suggestions for improvements are welcome.
> Sooner or later I'll put the entire thing on the wiki, including tools
> for doing random cross-validation.
>
> Implementation of this beauty below.
>
> NB. Distance between rows, different possibilities:
> NB. mind the &.: : true Euclidean distance
> NB. d=: +/&.:*:@(-"1)/
>
> NB. D^2 Avoiding squaring
> NB. d=: +/&:*:@(-"1)/
>
> NB. D^2 using integer math for speed
> d=: +/&:*:@(-"1)/&([: <. (2^20) * ])
>
> NB. x nn y
> NB.   takes x nearest neighbors for the test data,
> NB.   returning the indices of the NN in the training data.
> NB.   If discarding the node itself is needed, add >:@ before i.
> nn =: i.@:[ ({ /:)"1 ({: d&> {.)@]
>
> NB. x nnReg y
> NB.   where x = k neighbors and
> NB.   y =: (training labels) ; (training data) ; (data to be classified)
> NB.   k-nn regression is simply taking the average over the label of k
> neighbors.
> NB.   Taking the mean over "_1 makes it extend to multidimensional y
> nnReg =: (+/%#)"_1@((nn }.) { >@{.@])
>
> NB. x nnClass y
> NB.   As before: x = k neighbors and
> NB.   y =: (training labels) ; (training data) ; (data to be classified)
> NB.   Going from regression to classification is a matter of taking
> the highest one.
> NB.   take the first maximum encountered per row to eliminate any doubles.
> NB.   Note that, depending on the application, probably there are better
> NB.   ways resolving multiple maxima.
> nnClass =: ((i.@# = (i. >./))"1)@:nnReg
>
>
> NB. As nnClass and nnReg can handle multidimensional labels Y, extening to
> NB. a one-against-all multi class classifier is trivial:
> oaa =: 1 : 0
> :
> (>{.y) fromOAA x u toOAA y
> )
>
> NB. convert y
> NB.   y is the argument as for nnClass / nnReg
> toOAA =: |:@=&.>@{. , }.
>
> NB. x deoaa y
> NB.   x=: original Y
> NB.   y=: class map as multi-dimensional labels from nnClass
> NB.  converts from multi-dimensional labels to original ones.
> fromOAA =: (] #"1 ~.@[)
>
> Enjoy!
>
> 2014-06-10 21:52 GMT+02:00 Joe Bogner <[email protected]>:
> > I was reading this article
> http://huonw.github.io/2014/06/10/knn-rust.html
> > via hackernews https://news.ycombinator.com/item?id=7872398
> >
> > It seems like a good exercise for J. I don't have the time to do it over
> > the next few days but would be very interested in the results. I might
> get
> > around to it later in the week if no one else does.
> >
> > The data files appear to be here:
> > https://github.com/c4fsharp/Dojo-Digits-Recognizer/tree/master/Dojo
> >
> > I liked how clean the factor version was:
> > http://re-factor.blogspot.com/2014/06/comparing-k-nn-in-factor.html
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to