[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...

danielblazevski Wed, 07 Oct 2015 05:02:18 -0700

Github user danielblazevski commented on the pull request:

    https://github.com/apache/flink/pull/1220#issuecomment-146175315
  
    @chiwanpark, in lines 203-207
    +                  val useQuadTree = 
resultParameters.get(useQuadTreeParam).getOrElse(
    +                    training.values.head.size + 
math.log(math.log(training.values.length)/
    +                      math.log(4.0)) < 
math.log(training.values.length)/math.log(4.0) &&
    +                    (metric.isInstanceOf[EuclideanDistanceMetric] ||
    +                      metric.isInstanceOf[SquaredEuclideanDistanceMetric]))
    the code decides whether to use quadtree or not if no value is specified.  
This codes decides based on the number of training + test points + dimension, 
and is a conservative estimate so that when it uses the quadtree, the quadtree 
will improve performance compared to the brute-force method -- basically the 
quadtree scales poorly with dimension, but really well with the number of 
points. 
    
    As for using a `Vector` for `minVec` and `maxVec`, I plug in `minVec` and 
`maxVec` to construct the root Node, and I found it best to use a ListBuffer in 
the constructor for the Node class when partitioning the boxes into sub-boxes.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...

Reply via email to