[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544997#comment-14544997
 ] 

Chiwan Park commented on FLINK-1745:
------------------------------------

Hello, All. I pushed a draft of exact kNN Join in my github repository. 
(https://github.com/chiwanpark/flink/tree/FLINK-1745)
There are some issues in this draft:

# Because the {{Vector}} trait doesn't implement {{Comparable}} interface, we 
cannot use {{Vector}} as a key. But we need to group data by {{Vector}}, so I 
added a UUID string with each given vector. (I cannot add a {{java.util.UUID}} 
object because of lack of non-args constructor in {{java.util.UUID}}.) I'm 
concerned about overhead of the UUID string.
# Currently, the implementation is block nested loop kNN join. (called H-BNLJ 
in the given paper) We can use R-Tree to improve performance in nested loop 
stage. But I cannot find a R-Tree implementation with multi-dimension in scala. 
The implementation of H-BRJ in the paper uses ELKI Framework 
(http://elki.dbs.ifi.lmu.de), but I think we cannot use it because it is under 
AGPLv3 license.

How can I solve this issues?
If you tell your opinion to me, I really appreciate it.

Regards
Chiwan Park

> Add exact k-nearest-neighbours algorithm to machine learning library
> --------------------------------------------------------------------
>
>                 Key: FLINK-1745
>                 URL: https://issues.apache.org/jira/browse/FLINK-1745
>             Project: Flink
>          Issue Type: New Feature
>          Components: Machine Learning Library
>            Reporter: Till Rohrmann
>            Assignee: Chiwan Park
>              Labels: ML, Starter
>
> Even though the k-nearest-neighbours (kNN) [1,2] algorithm is quite trivial 
> it is still used as a mean to classify data and to do regression. This issue 
> focuses on the implementation of an exact kNN (H-BNLJ, H-BRJ) algorithm as 
> proposed in [2].
> Could be a starter task.
> Resources:
> [1] [http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm]
> [2] [https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to