Thank you for the suggestion.

2014-06-11 1:06 GMT+02:00 Raul Miller <[email protected]>:
> If you are only comparing distances (and not measuring them), you can skip
> the square root and just compare the magnitudes of the sums of the squares.

I first started out using Euclidean distance:

d=: +/&.:*:@(-"1)/

Elminating the square root and using d^2 instead of d does indeed take
about a third of the execution time:

d=: +/&:*:@(-"1)/

Further reduction in execution time can be gotten by using integer arithmetic:

d=: +/&:*:@(-"1)/&([: <. (2^20) * ])

This cuts the time needed again by a third or so.
I have yet to find something which adapts the amount of shift to the
granularity and range of the data though...

> That said, I've not taken enough time to study this problem to understand
> the data. (For me, understanding the data is usually harder than
> understanding the code. And, even when the code needs effort to understand,
> I need to understand the data before I can even think about trying to
> understand the code.)

Well, only the format of the data matters, what type it is or where it
comes from does less.
The idea is 1 single instance is represented by a label (or class) and
a vector of P
variables (in the case of digit recognition the value of each pixel in
an image).
Stacking these vectors in the natural direction gives you an N x P
matrix of data, along with an N vector of labels.

Now you get the validation set, which do have M labels for each M
instances (again P-vectors). These validation labels are not used in
the prediction, but to assess the accuracy of the classifier by
comparing them with the predicted classes.

Basically, if your data has enough correlation with the labels, you
can shove in any type of data.

Jan-Pieter
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to