[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14502686#comment-14502686
 ] 

Chiwan Park commented on FLINK-1745:
------------------------------------

I suggest a scenario of KNN like following:

{code}
val trainingDS: DataSet[LabeledVector] = ... // training data
val distanceMeasure = ... // we need some distance measure model to calculate 
distance between two vector such as Euclidean distance, Cosine distance, 
Manhattan distance)

val knn= KNN().setK(10).setDistanceMeasure(distanceMeasure)
val model: KNNModel = knn.fit(trainingDS)

val testingDS = ... // testing data
val predictionDS: DataSet[LabeledVector] = model.transform(testingDS) // and we 
can provide KNNModel.transform(Vector) also for prediction of single vector.
{code}

The name of methods and classes are inspired by CoCoA implementation of 
flink-ml. :)
The concept of distance measure is inspired by mahout implementation. 
(https://github.com/apache/mahout/blob/master/mr/src/main/java/org/apache/mahout/common/distance/DistanceMeasure.java)
 I think we need another issue for distance measure.

How about this scenario?

> Add k-nearest-neighbours algorithm to machine learning library
> --------------------------------------------------------------
>
>                 Key: FLINK-1745
>                 URL: https://issues.apache.org/jira/browse/FLINK-1745
>             Project: Flink
>          Issue Type: New Feature
>          Components: Machine Learning Library
>            Reporter: Till Rohrmann
>              Labels: ML, Starter
>
> Even though the k-nearest-neighbours (kNN) [1,2] algorithm is quite trivial 
> it is still used as a mean to classify data and to do regression.
> Could be a starter task.
> Resources:
> [1] [http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm]
> [2] [https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to