[jira] [Commented] (MADLIB-927) Initial implementation of k-NN

ASF GitHub Bot (JIRA) Fri, 27 Jan 2017 12:00:56 -0800

    [ 
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15843401#comment-15843401
 ]


ASF GitHub Bot commented on MADLIB-927:
---------------------------------------

Github user njayaram2 commented on the issue:

    https://github.com/apache/incubator-madlib/pull/81
  
    Some useful validation functions in
    
https://github.com/apache/incubator-madlib/blob/master/src/ports/postgres/modules/utilities/validate_args.py_in
 (look at the comments in the code for those functions, use if it applies to 
your scenario):
    ```
    table_exists
    get_cols_and_types
    columns_exist_in_table
    is_col_array
    is_var_valid
    input_tbl_valid
    output_tbl_valid
    cols_in_tbl_valid
    ```
    
https://github.com/apache/incubator-madlib/blob/master/src/ports/postgres/modules/utilities/utilities.py_in:
    ```
    unique_string
    ```
    
    You could probably have one function to validate input all args, include 
your current validations checks as part of that python function.


> Initial implementation of k-NN
> ------------------------------
>
>                 Key: MADLIB-927
>                 URL: https://issues.apache.org/jira/browse/MADLIB-927
>             Project: Apache MADlib
>          Issue Type: New Feature
>            Reporter: Rahul Iyer
>              Labels: gsoc2016, starter
>
> k-Nearest Neighbors is a simple algorithm based on finding nearest neighbors 
> of data points in a metric feature space according to a specified distance 
> function. It is considered one of the canonical algorithms of data science. 
> It is a nonparametric method, which makes it applicable to a lot of 
> real-world problems where the data doesn’t satisfy particular distribution 
> assumptions. It can also be implemented as a lazy algorithm, which means 
> there is no training phase where information in the data is condensed into 
> coefficients, but there is a costly testing phase where all data (or some 
> subset) is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k 
> nearest neighbors by going through all points.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MADLIB-927) Initial implementation of k-NN

Reply via email to