Github user njayaram2 commented on the issue:
https://github.com/apache/incubator-madlib/pull/80
Overall, this looks great and the code seems to be working correctly.
We can certainly make various modifications to it. For instance:
- Try to use other distance measures. If we can indeed use other distance
measures, we should have that as an optional parameter.
- We can try to improve the performance using parallel processing,
say on a distributed database like Greenplum. We can use
UDAs (with C++ code: Chapter 1 in
http://madlib.incubator.apache.org/design.pdf
I had mentioned earlier) that can help in finding the k-nearest neighbors
in parallel.
I guess that is something we can look at once you work on the comments
made on this version. We can discuss that when we get there!
- Incorporate other changes the community might suggest. There are
several postgres experts in the community who might be able to provide
suggestions to make the existing code more performant!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---