[ https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507333#comment-14507333 ]
Till Rohrmann commented on FLINK-1745: -------------------------------------- Hi Raghav, if I understood it correctly, then approach 1 and 2 are implementing the same approximate kNN algorithm. The difference is only that the first paper implements it on MapReduce and the latter paper on a relational database. I personally think that we should add eventually an approximate kNN implementation to the ML library because we want to scale to large amounts of data. The exact implementations can act as good baseline method, though. The problem with the zkNN IMHO is to calculate the z-value for double based feature vectors. There is another paper http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=4118757 which implements a different approximation algorithm for kNN. This might be an alternative to zkNN. Or at least it can act as good comparison for zkNN. Maybe each one of you [~raghav.chalapa...@gmail.com] [~chiwanpark] picks one algorithm to implement and then we give the user of the ML library the choice to select what suits him best. What do you think? > Add k-nearest-neighbours algorithm to machine learning library > -------------------------------------------------------------- > > Key: FLINK-1745 > URL: https://issues.apache.org/jira/browse/FLINK-1745 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library > Reporter: Till Rohrmann > Assignee: Chiwan Park > Labels: ML, Starter > > Even though the k-nearest-neighbours (kNN) [1,2] algorithm is quite trivial > it is still used as a mean to classify data and to do regression. > Could be a starter task. > Resources: > [1] [http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm] > [2] [https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf] -- This message was sent by Atlassian JIRA (v6.3.4#6332)