Greetings everyone,

I am Babak Alipour, a student at University of Florida. I have been using
MADlib and was hoping to use kNN classification, which is unfortunately not
available so I decided to give implementing it a shot.

Looking at the issue tracker, I found two Jiras regarding kNN: MADLIB-409
and MADLIB-927.
The more recent JIRA mentions that a naive implementation, or linearly
searching through the data, is expected.
I have a few questions regarding the details the JIRA doesn't specify:
Generally, what is the interface of the module? This questions involves
questions such as:  Where is the user expected to provide k, whether to use
distance weighting and distance metric (manhattan, euclidean, minkowski
with some p > 2)?
 Another question is, how should the user specify the data points whose
k-nearest neighbors are desired? Is it some subset of the original data
table or points from another data table with same schema as the original
data table?
Also, are the output points to be kept in a separate table?

I'd love to hear some feedback from the community so that I can move
forward with the implementation.

Thanks in advance for your time.


Best regards,
*Babak Alipour ,*
*University of Florida*

Reply via email to