[
https://issues.apache.org/jira/browse/MADLIB-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Frank McQuillan updated MADLIB-1370:
------------------------------------
Description:
In unsupervised mode of knn
http://madlib.apache.org/docs/latest/group__grp__knn.html
when `point_source` and `test_source` are the same data set, nearest neighbors
is not reliably returning the 0 distance point as a nearest neighbor.
Could there a small neg issue here for a distance that is effectively 0 but
shows up as neg epsilon?
Also, please assess if we can add a vector of distances to the output file
{code}
Output Format
The output of the KNN module is a table with the following columns:
id INTEGER. The ids of test data points.
test_column_name DOUBLE PRECISION[]. The test data points.
prediction INTEGER. Label in case of classification, average value in case
of regression.
k_nearest_neighbours INTEGER[]. List of nearest neighbors, sorted closest to
furthest from the corresponding test point.
{code}
which could help trouble shoot this for users in the future
was:
In unsupervised mode of knn
http://madlib.apache.org/docs/latest/group__grp__knn.html
when `point_source` and `test_source` are the same data set, nearest neighbors
is not reliably returning the 0 distance point as a nearest neighbor.
Could there a small neg issue here?
Also, please assess if we can add a vector of distances to the output file
{code}
Output Format
The output of the KNN module is a table with the following columns:
id INTEGER. The ids of test data points.
test_column_name DOUBLE PRECISION[]. The test data points.
prediction INTEGER. Label in case of classification, average value in case
of regression.
k_nearest_neighbours INTEGER[]. List of nearest neighbors, sorted closest to
furthest from the corresponding test point.
{code}
which could help trouble shoot this for users in the future
> Knn in unsupervised mode not producing consistent results
> ---------------------------------------------------------
>
> Key: MADLIB-1370
> URL: https://issues.apache.org/jira/browse/MADLIB-1370
> Project: Apache MADlib
> Issue Type: Bug
> Components: k-NN
> Reporter: Frank McQuillan
> Priority: Major
> Fix For: v1.17
>
>
> In unsupervised mode of knn
> http://madlib.apache.org/docs/latest/group__grp__knn.html
> when `point_source` and `test_source` are the same data set, nearest
> neighbors is not reliably returning the 0 distance point as a nearest
> neighbor.
> Could there a small neg issue here for a distance that is effectively 0 but
> shows up as neg epsilon?
> Also, please assess if we can add a vector of distances to the output file
> {code}
> Output Format
> The output of the KNN module is a table with the following columns:
> id INTEGER. The ids of test data points.
> test_column_name DOUBLE PRECISION[]. The test data points.
> prediction INTEGER. Label in case of classification, average value in case
> of regression.
> k_nearest_neighbours INTEGER[]. List of nearest neighbors, sorted closest to
> furthest from the corresponding test point.
> {code}
> which could help trouble shoot this for users in the future
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)