[
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15836979#comment-15836979
]
ASF GitHub Bot commented on MADLIB-927:
---------------------------------------
Github user njayaram2 commented on the issue:
https://github.com/apache/incubator-madlib/pull/81
Some more input error checks, apart from what Orhan has already mentioned
above (note that
you may have already covered some of these, I just listed things out of the
top of my head):
- validity of types of parameters `point_column_name` (must be an array),
`label_column_name`
(must be an int/bool), `test_column_name` (must be an array) and
`id_column_name` (integer).
- validity of the `operation` parameter. Look for error cases such as
empty, NULL, `xyz` etc.
- validity of the `k` parameter. Look for error cases such as 0, -1, and a
number greater than
the total number of rows in `point_source`.
It's probably easier to validate input parameters in python since you can
use existing helper
functions in MADlib python modules such as `utilities.validate_args` and
`utilities.utilities`.
> Initial implementation of k-NN
> ------------------------------
>
> Key: MADLIB-927
> URL: https://issues.apache.org/jira/browse/MADLIB-927
> Project: Apache MADlib
> Issue Type: New Feature
> Reporter: Rahul Iyer
> Labels: gsoc2016, starter
>
> k-Nearest Neighbors is a simple algorithm based on finding nearest neighbors
> of data points in a metric feature space according to a specified distance
> function. It is considered one of the canonical algorithms of data science.
> It is a nonparametric method, which makes it applicable to a lot of
> real-world problems where the data doesn’t satisfy particular distribution
> assumptions. It can also be implemented as a lazy algorithm, which means
> there is no training phase where information in the data is condensed into
> coefficients, but there is a costly testing phase where all data (or some
> subset) is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k
> nearest neighbors by going through all points.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)