Github user danielblazevski commented on the pull request:
https://github.com/apache/flink/pull/1220#issuecomment-145364401
Thanks @chiwanpark for the very useful comments. I have made changes to
the comments, which can be found here:
https://github.com/danielblazevski/flink/tree/FLINK-1745/flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/nn
I also changed the testing of KNN + QuadTree, which can be found here:
https://github.com/danielblazevski/flink/tree/FLINK-1745/flink-staging/flink-ml/src/test/scala/org/apache/flink/ml/nn
Since useQuadTree is now a parameter, I did not need KNNQuadTreeSuite
anymore and I removed it.
I did not address comment 6 yet. I need to have the training set before I
can define a non-user specified useQuadTree, so any main if(useQuadTree) should
come within ` val crossed = trainingSet.cross(inputSplit).mapPartition {`
About your last "P.S" comment, Creating the quadtree after the cross
operation is likely more efficient -- each CPU/Node will form their own
quadtree, which is what is suggested for the R-tree here:
https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf
This will result less communication overhead than creating a more global
quadtree, if that is what you were referring to.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---