[
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15752824#comment-15752824
]
ASF GitHub Bot commented on MADLIB-927:
---------------------------------------
Github user njayaram2 commented on the issue:
https://github.com/apache/incubator-madlib/pull/80
This is a great start!
I will provide some github-specific feedback here, and more knn-specific
comments in the code.
Git can be daunting to use at first, but it's great once you get a hang of
it.
I would recommend you go through the following wonderful book if you
have not already done so:
https://git-scm.com/book/en/v2
When you work on a feature/bug, it is best if you create a branch locally
and make all changes for that feature there. You can then push that branch
into your github repo and open a pull request. This way you won't mess with
your local master branch, which should ideally be in sync with the origin's
(apache/incubator-madlib in this case) master branch. More information on
how to work with branches can be found in the following chapter:
https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-Nutshell
(especially section 3.5)
One other minor feedback is to try including the corresponding JIRA id
with the commit message. The JIRA associated with this feature is:
https://issues.apache.org/jira/browse/MADLIB-927
> Initial implementation of k-NN
> ------------------------------
>
> Key: MADLIB-927
> URL: https://issues.apache.org/jira/browse/MADLIB-927
> Project: Apache MADlib
> Issue Type: New Feature
> Reporter: Rahul Iyer
> Labels: gsoc2016, starter
>
> k-Nearest Neighbors is a simple algorithm based on finding nearest neighbors
> of data points in a metric feature space according to a specified distance
> function. It is considered one of the canonical algorithms of data science.
> It is a nonparametric method, which makes it applicable to a lot of
> real-world problems where the data doesn’t satisfy particular distribution
> assumptions. It can also be implemented as a lazy algorithm, which means
> there is no training phase where information in the data is condensed into
> coefficients, but there is a costly testing phase where all data (or some
> subset) is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k
> nearest neighbors by going through all points.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)