[ 
https://issues.apache.org/jira/browse/SPARK-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526153#comment-14526153
 ] 

Sen Fang commented on SPARK-2336:
---------------------------------

Hey Longbao, great to hear from you. To my best understanding, in the paper I 
cited above, they are distributing the input points by pushing them through the 
top tree (figure 4). Of course, for a precise result, this means we need to 
backtrack which isn't very ideal. What they propose is to use a buffer boundary 
like a spill tree. However unlike spill tree, here you would push the input 
targets to both children if it falls within the buffer zone, because the top 
tree was built as a metric tree (they explained the reason being a spill tree 
as top tree has a high memory penalty). So every input now might end up in 
multiple subtrees and you will need to reduceByKey at the end to keep the top K 
neighbors.

Is your implementation available somewhere? I'm having a hard time to find time 
to finish my implementation this month. Would be great if eventually we can 
compare our implementations, validate and benchmark.

> Approximate k-NN Models for MLLib
> ---------------------------------
>
>                 Key: SPARK-2336
>                 URL: https://issues.apache.org/jira/browse/SPARK-2336
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Brian Gawalt
>            Priority: Minor
>              Labels: clustering, features
>
> After tackling the general k-Nearest Neighbor model as per 
> https://issues.apache.org/jira/browse/SPARK-2335 , there's an opportunity to 
> also offer approximate k-Nearest Neighbor. A promising approach would involve 
> building a kd-tree variant within from each partition, a la
> http://www.autonlab.org/autonweb/14714.html?branch=1&language=2
> This could offer a simple non-linear ML model that can label new data with 
> much lower latency than the plain-vanilla kNN versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to