[ 
https://issues.apache.org/jira/browse/SPARK-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14335476#comment-14335476
 ] 

Xiangrui Meng commented on SPARK-2336:
--------------------------------------

[~Rusty] Could you provide a summary of your plan? For example,

1. Which paper are you going to follow? Any modification to the algorithm 
proposed in the paper?
2. What's the complexity of the algorithm and the expected scalability?
3. What is the trade-off between the approximate error and the cost? How do you 
want to expose it to users?

> Approximate k-NN Models for MLLib
> ---------------------------------
>
>                 Key: SPARK-2336
>                 URL: https://issues.apache.org/jira/browse/SPARK-2336
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Brian Gawalt
>            Priority: Minor
>              Labels: clustering, features
>
> After tackling the general k-Nearest Neighbor model as per 
> https://issues.apache.org/jira/browse/SPARK-2335 , there's an opportunity to 
> also offer approximate k-Nearest Neighbor. A promising approach would involve 
> building a kd-tree variant within from each partition, a la
> http://www.autonlab.org/autonweb/14714.html?branch=1&language=2
> This could offer a simple non-linear ML model that can label new data with 
> much lower latency than the plain-vanilla kNN versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to