[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507333#comment-14507333
 ] 

Till Rohrmann commented on FLINK-1745:
--------------------------------------

Hi Raghav,

if I understood it correctly, then approach 1 and 2 are implementing the same 
approximate kNN algorithm. The difference is only that the first paper 
implements it on MapReduce and the latter paper on a relational database.

I personally think that we should add eventually an approximate kNN 
implementation to the ML library because we want to scale to large amounts of 
data. The exact implementations can act as good baseline method, though.

The problem with the zkNN IMHO is to calculate the z-value for double based 
feature vectors. There is another paper 
http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=4118757 which 
implements a different approximation algorithm for kNN. This might be an 
alternative to zkNN. Or at least it can act as good comparison for zkNN.

Maybe each one of you [~raghav.chalapa...@gmail.com] [~chiwanpark] picks one 
algorithm to implement and then we give the user of the ML library the choice 
to select what suits him best. What do you think?

> Add k-nearest-neighbours algorithm to machine learning library
> --------------------------------------------------------------
>
>                 Key: FLINK-1745
>                 URL: https://issues.apache.org/jira/browse/FLINK-1745
>             Project: Flink
>          Issue Type: New Feature
>          Components: Machine Learning Library
>            Reporter: Till Rohrmann
>            Assignee: Chiwan Park
>              Labels: ML, Starter
>
> Even though the k-nearest-neighbours (kNN) [1,2] algorithm is quite trivial 
> it is still used as a mean to classify data and to do regression.
> Could be a starter task.
> Resources:
> [1] [http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm]
> [2] [https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to