Github user chiwanpark commented on the pull request:

    https://github.com/apache/flink/pull/1220#issuecomment-145353515
  
    I just reviewed quickly. I have some comments for this pull request.
    
    1. This pull request cannot pass checking Scala code style. The Scala code 
should not exceed 100 characters in each line. I also found some wrong 
indentations such as line 200-211 in `KNN.scala` file. Please reformat your 
code.
    2. The meaning of `BruteOrQuad` is ambiguous. From my understand, if 
`BruteOrQuad` is true, we use quadtree implementation, right? So the other name 
such as `useQuadTree` would be better.
    3. Please remove blank line after Scala doc comments.
    4. Your implementation assumes that the type of given `Vector` is 
`DenseVector`. But basically, there is no guarantee of the type of `Vector`. We 
should generalize the implementation.
    5. We should be able to configure the use of quadtree implementation. 
Please add configure parameter into `KNN` object.
    6. Currently, `predictDataSet` is too big and complex because of large code 
in `mapPartition` operation. How about split the operation into `mapPartition` 
for raw and quadtree?
    7. Please avoid calculation of `BigDecimal`. Maybe we can check whether we 
use quadtree or not without `BigDecimal` like following:
      * `(4 ^ dim) * Ntest * log(Ntrain) < Ntest * Ntrain` is same as `dim + 
log_4(log(Ntrain)) < log_4(Ntrain)`
    
    This is great start! If you have some question about my comments, please 
feel free to reply.
    
    P.S.: How about create training quadtree before cross operation? Is there 
any intention to create quadtree after cross operation?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to