[
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697106#comment-14697106
]
ASF GitHub Bot commented on FLINK-1745:
---------------------------------------
Github user kno10 commented on the pull request:
https://github.com/apache/flink/pull/696#issuecomment-131123325
On low-dimensional data, exact kNN may be feasible using grid-based
approaches even for very large data sets. It's not very "sexy" to implement
this, but its also not very hard.
Also, a lot of users will be using data sets where pairwise distance
computations is still possible (it's not as if everybody has exabyte vector
data), so why deprive them of this option, even if it is too costly for others?
Last but not least, for evaluation purposes, exact kNN can be useful as a
badeline, too.
> Add exact k-nearest-neighbours algorithm to machine learning library
> --------------------------------------------------------------------
>
> Key: FLINK-1745
> URL: https://issues.apache.org/jira/browse/FLINK-1745
> Project: Flink
> Issue Type: New Feature
> Components: Machine Learning Library
> Reporter: Till Rohrmann
> Labels: ML, Starter
>
> Even though the k-nearest-neighbours (kNN) [1,2] algorithm is quite trivial
> it is still used as a mean to classify data and to do regression. This issue
> focuses on the implementation of an exact kNN (H-BNLJ, H-BRJ) algorithm as
> proposed in [2].
> Could be a starter task.
> Resources:
> [1] [http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm]
> [2] [https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)