[
https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15773398#comment-15773398
]
ASF GitHub Bot commented on MADLIB-927:
---------------------------------------
Github user orhankislal commented on a diff in the pull request:
https://github.com/apache/incubator-madlib/pull/81#discussion_r93790128
--- Diff: src/ports/postgres/modules/knn/test/knn.sql_in ---
@@ -0,0 +1,41 @@
+m4_include(`SQLCommon.m4')
+/*
-----------------------------------------------------------------------------
+ * Test knn.
+ *
+ * FIXME: Verify results
--- End diff --
This file is used when you run the install-check. Since the dataset is
small you can calculate the correct results by hand (or using some other knn
implementation from python, R etc.) and then run an assertion function to
ensure the result is correct.
Since many functions are interconnected, using an install check helps us to
identify problems faster. Assume that somebody changed the `squared_dist_norm2`
function implementation for some reason and it started to give incorrect
results. This will cause the knn install-check to fail and lead us to more
investigation.
> Initial implementation of k-NN
> ------------------------------
>
> Key: MADLIB-927
> URL: https://issues.apache.org/jira/browse/MADLIB-927
> Project: Apache MADlib
> Issue Type: New Feature
> Reporter: Rahul Iyer
> Labels: gsoc2016, starter
>
> k-Nearest Neighbors is a simple algorithm based on finding nearest neighbors
> of data points in a metric feature space according to a specified distance
> function. It is considered one of the canonical algorithms of data science.
> It is a nonparametric method, which makes it applicable to a lot of
> real-world problems where the data doesn’t satisfy particular distribution
> assumptions. It can also be implemented as a lazy algorithm, which means
> there is no training phase where information in the data is condensed into
> coefficients, but there is a costly testing phase where all data (or some
> subset) is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k
> nearest neighbors by going through all points.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)