[
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15114458#comment-15114458
]
ASF GitHub Bot commented on FLINK-1745:
---------------------------------------
Github user danielblazevski commented on the pull request:
https://github.com/apache/flink/pull/1220#issuecomment-174329818
@chiwanpark I see, I thought maybe there was a way to not even use a cross
at all. I changed the code according to your suggestion and got an error.
First, I assumed to add a line
```scala
val sizeHint = resultParameters.get(SizeHint).get
```
before the
```scala
val crossTuned = sizeHint match {...
```
clause. Attached is a screenshot form IntelliJ.
<img width="1280" alt="screenshot 2016-01-24 13 27 00"
src="https://cloud.githubusercontent.com/assets/10012612/12538089/3d9801a4-c29e-11e5-9c8d-419c06fa7553.png">
Another logistical question for @chiwanpark and @tillrohrmann is that I see
the directory structure of Flink has changed since my initial PR. I'm not sure
what is the best practice here. I see a couple of less-than-ideal options:
(1) create a new PR with updated directory structure, not ideal (2) pull the
master branch, merge with this branch, but then when I commit many many commits
will be added not relevant to this PR when I merge (less ideal...).
On a smaller note, I see your point @chiwanpark about raising the flag
earlier with the choice of metric and using a quadtree. Do we want to do this
in `fit` though? In `fit`, I can get the metric and the parameter
`useQuadTree`, but if the user does not specify `setUseQuadTree`, then I still
have a conservative test that requires one to know how many training and test
points there are. That will determine whether or not to use the quadtree (i.e.
will only use a quadtree if it will improve performance based on a conservative
test). Is it OK to put in `predictValues` instead where all the variables
needed -- metric, training and test sets -- have been passed? Otherwise I
will have to re-factor the code more.
I changed the format based on @chiwanpark 's suggestion to make it look
like what @tillrohrmann suggested.
I committed and pushed the code if you'd like (added a knn.md file in docs,
but that is still very much a work in progress :-)
> Add exact k-nearest-neighbours algorithm to machine learning library
> --------------------------------------------------------------------
>
> Key: FLINK-1745
> URL: https://issues.apache.org/jira/browse/FLINK-1745
> Project: Flink
> Issue Type: New Feature
> Components: Machine Learning Library
> Reporter: Till Rohrmann
> Assignee: Daniel Blazevski
> Labels: ML, Starter
>
> Even though the k-nearest-neighbours (kNN) [1,2] algorithm is quite trivial
> it is still used as a mean to classify data and to do regression. This issue
> focuses on the implementation of an exact kNN (H-BNLJ, H-BRJ) algorithm as
> proposed in [2].
> Could be a starter task.
> Resources:
> [1] [http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm]
> [2] [https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)