[ https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14502686#comment-14502686 ]
Chiwan Park commented on FLINK-1745: ------------------------------------ I suggest a scenario of KNN like following: {code} val trainingDS: DataSet[LabeledVector] = ... // training data val distanceMeasure = ... // we need some distance measure model to calculate distance between two vector such as Euclidean distance, Cosine distance, Manhattan distance) val knn= KNN().setK(10).setDistanceMeasure(distanceMeasure) val model: KNNModel = knn.fit(trainingDS) val testingDS = ... // testing data val predictionDS: DataSet[LabeledVector] = model.transform(testingDS) // and we can provide KNNModel.transform(Vector) also for prediction of single vector. {code} The name of methods and classes are inspired by CoCoA implementation of flink-ml. :) The concept of distance measure is inspired by mahout implementation. (https://github.com/apache/mahout/blob/master/mr/src/main/java/org/apache/mahout/common/distance/DistanceMeasure.java) I think we need another issue for distance measure. How about this scenario? > Add k-nearest-neighbours algorithm to machine learning library > -------------------------------------------------------------- > > Key: FLINK-1745 > URL: https://issues.apache.org/jira/browse/FLINK-1745 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library > Reporter: Till Rohrmann > Labels: ML, Starter > > Even though the k-nearest-neighbours (kNN) [1,2] algorithm is quite trivial > it is still used as a mean to classify data and to do regression. > Could be a starter task. > Resources: > [1] [http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm] > [2] [https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf] -- This message was sent by Atlassian JIRA (v6.3.4#6332)