Greetings everyone, I am Babak Alipour, a student at University of Florida. I have been using MADlib and was hoping to use kNN classification, which is unfortunately not available so I decided to give implementing it a shot.
Looking at the issue tracker, I found two Jiras regarding kNN: MADLIB-409 and MADLIB-927. The more recent JIRA mentions that a naive implementation, or linearly searching through the data, is expected. I have a few questions regarding the details the JIRA doesn't specify: Generally, what is the interface of the module? This questions involves questions such as: Where is the user expected to provide k, whether to use distance weighting and distance metric (manhattan, euclidean, minkowski with some p > 2)? Another question is, how should the user specify the data points whose k-nearest neighbors are desired? Is it some subset of the original data table or points from another data table with same schema as the original data table? Also, are the output points to be kept in a separate table? I'd love to hear some feedback from the community so that I can move forward with the implementation. Thanks in advance for your time. Best regards, *Babak Alipour ,* *University of Florida*