[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697106#comment-14697106
 ] 

ASF GitHub Bot commented on FLINK-1745:
---------------------------------------

Github user kno10 commented on the pull request:

    https://github.com/apache/flink/pull/696#issuecomment-131123325
  
    On low-dimensional data, exact kNN may be feasible using grid-based 
approaches even for very large data sets. It's not very "sexy" to implement 
this, but its also not very hard.
    Also, a lot of users will be using data sets where pairwise distance 
computations is still possible (it's not as if everybody has exabyte vector 
data), so why deprive them of this option, even if it is too costly for others?
    Last but not least, for evaluation purposes, exact kNN can be useful as a 
badeline, too.


> Add exact k-nearest-neighbours algorithm to machine learning library
> --------------------------------------------------------------------
>
>                 Key: FLINK-1745
>                 URL: https://issues.apache.org/jira/browse/FLINK-1745
>             Project: Flink
>          Issue Type: New Feature
>          Components: Machine Learning Library
>            Reporter: Till Rohrmann
>              Labels: ML, Starter
>
> Even though the k-nearest-neighbours (kNN) [1,2] algorithm is quite trivial 
> it is still used as a mean to classify data and to do regression. This issue 
> focuses on the implementation of an exact kNN (H-BNLJ, H-BRJ) algorithm as 
> proposed in [2].
> Could be a starter task.
> Resources:
> [1] [http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm]
> [2] [https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to