[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

ASF GitHub Bot (JIRA) Sun, 24 Jan 2016 10:59:31 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15114458#comment-15114458
 ]


ASF GitHub Bot commented on FLINK-1745:
---------------------------------------

Github user danielblazevski commented on the pull request:

    https://github.com/apache/flink/pull/1220#issuecomment-174329818
  
    @chiwanpark I see, I thought maybe there was a way to not even use a cross 
at all.  I changed the code according to your suggestion and got an error.  
    
    First, I assumed to add a line 
    ```scala
    val sizeHint = resultParameters.get(SizeHint).get
    ```
    before the 
    ```scala 
    val crossTuned = sizeHint match {...
    ``` 
    clause.  Attached is a screenshot form IntelliJ.  
    <img width="1280" alt="screenshot 2016-01-24 13 27 00" 
src="https://cloud.githubusercontent.com/assets/10012612/12538089/3d9801a4-c29e-11e5-9c8d-419c06fa7553.png";>
    
    Another logistical question for @chiwanpark and @tillrohrmann is that I see 
the directory structure of Flink has changed since my initial PR.  I'm not sure 
what is the best practice here.  I see a couple of less-than-ideal options:  
(1) create a new PR with updated directory structure, not ideal (2) pull the 
master branch, merge with this branch, but then when I commit many many commits 
will be added not relevant to this PR when I merge (less ideal...).  
    
    On a smaller note, I see your point @chiwanpark about raising the flag 
earlier with the choice of metric and using a quadtree.  Do we want to do this 
in `fit` though?  In `fit`, I can get the metric and the parameter 
`useQuadTree`, but if the user does not specify `setUseQuadTree`, then I still 
have a conservative test that requires one to know how many training and test 
points there are.  That will determine whether or not to use the quadtree (i.e. 
will only use a quadtree if it will improve performance based on a conservative 
test).  Is it OK to put in `predictValues` instead where all the variables 
needed -- metric, training  and test sets -- have been passed?  Otherwise I 
will have to re-factor the code more.  
    
    I changed the format based on @chiwanpark 's suggestion to make it look 
like what @tillrohrmann suggested.  
    
    I committed and pushed the code if you'd like (added a knn.md file in docs, 
but that is still very much a work in progress :-) 


> Add exact k-nearest-neighbours algorithm to machine learning library
> --------------------------------------------------------------------
>
>                 Key: FLINK-1745
>                 URL: https://issues.apache.org/jira/browse/FLINK-1745
>             Project: Flink
>          Issue Type: New Feature
>          Components: Machine Learning Library
>            Reporter: Till Rohrmann
>            Assignee: Daniel Blazevski
>              Labels: ML, Starter
>
> Even though the k-nearest-neighbours (kNN) [1,2] algorithm is quite trivial 
> it is still used as a mean to classify data and to do regression. This issue 
> focuses on the implementation of an exact kNN (H-BNLJ, H-BRJ) algorithm as 
> proposed in [2].
> Could be a starter task.
> Resources:
> [1] [http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm]
> [2] [https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

Reply via email to