[
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rahul Iyer updated MADLIB-927:
------------------------------
Description:
k-Nearest Neighbors is a simple algorithm based on finding nearest neighbors of
data points in a metric feature space according to a specified distance
function. It is considered one of the canonical algorithms of data science. It
is a nonparametric method, which makes it applicable to a lot of real-world
problems where the data doesn’t satisfy particular distribution assumptions. It
can also be implemented as a lazy algorithm, which means there is no training
phase where information in the data is condensed into coefficients, but there
is a costly testing phase where all data (or some subset) is used to make
predictions.
This JIRA involves implementing the naïve approach - i.e. compute the k nearest
neighbors by going through all points.
was:
k-Nearest Neighbors is a very simple algorithm that is based on finding nearest
neighbors of data points in a metric feature space according to a specified
distance function. It is considered one of the canonical algorithms of data
science. It is a nonparametric method, which makes it applicable to a lot of
real-world problems, where the data doesn’t satisfy particular distribution
assumptions. Also, it can be implemented as a lazy algorithm, which means there
is no training phase where information in the data is condensed into
coefficients, but there is a costly testing phase where all data is used to
make predictions.
This JIRA involves implementing the naïve approach - i.e. compute the k nearest
neighbors by going through all points.
> Initial implementation of k-NN
> ------------------------------
>
> Key: MADLIB-927
> URL: https://issues.apache.org/jira/browse/MADLIB-927
> Project: Apache MADlib
> Issue Type: New Feature
> Reporter: Rahul Iyer
> Labels: gsoc2016, starter
>
> k-Nearest Neighbors is a simple algorithm based on finding nearest neighbors
> of data points in a metric feature space according to a specified distance
> function. It is considered one of the canonical algorithms of data science.
> It is a nonparametric method, which makes it applicable to a lot of
> real-world problems where the data doesn’t satisfy particular distribution
> assumptions. It can also be implemented as a lazy algorithm, which means
> there is no training phase where information in the data is condensed into
> coefficients, but there is a costly testing phase where all data (or some
> subset) is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k
> nearest neighbors by going through all points.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)