[
https://issues.apache.org/jira/browse/MADLIB-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16581752#comment-16581752
]
Frank McQuillan edited comment on MADLIB-1060 at 8/16/18 12:11 AM:
-------------------------------------------------------------------
I think so. The expressions must return a valid type for the parameters, which
is a numeric array:
{code}
point_source
TEXT. Name of the table containing the training data points. Training data
points are expected to be stored row-wise in a column of type DOUBLE
PRECISION[].
point_column_name
TEXT. Name of the column with training data points.
{code}
and
{code}
test_source
TEXT. Name of the table containing the test data points. Testing data points
are expected to be stored row-wise in a column of type DOUBLE PRECISION[].
test_column_name
TEXT. Name of the column with testing data points.
{code}
If the user puts an expression that does not evaluate to a numeric array, then
it will fail and they will get this error, which is fine:
{code}
InternalError: (psycopg2.InternalError) plpy.Error: kNN Error: Feature column
'data' in test table is not an array.
CONTEXT: Traceback (most recent call last):
PL/Python function "knn", line 33, in <module>
weighted_avg
PL/Python function "knn", line 160, in knn
PL/Python function "knn", line 63, in knn_validate_src
PL/Python function "knn"
PL/pgSQL function madlib.knn(character varying,character varying,character
varying,character varying,character varying,character varying,character
varying,character varying,integer,boolean,text) line 5 at assignment
[SQL: "SELECT * FROM madlib.knn(\n 'knn_train_data', --
Table of training data\n 'data', -- Col name of
training data\n 'id', -- Col name of id in
train data\n 'label', -- Training labels\n
'knn_test_data', -- Table of test data\n 'data',
-- Col name of test data\n 'id', --
Col name of id in test data\n 'knn_result_classification', --
Output table\n 3, -- Number of nearest
neighbors\n True, -- True to list
nearest-neighbors by id\n 'madlib.squared_dist_norm2' --
Distance function\n );"]
{code}
was (Author: fmcquillan):
I think so. The expressions must return a valid type for the parameters, which
is a numeric array:
{code}
point_source
TEXT. Name of the table containing the training data points. Training data
points are expected to be stored row-wise in a column of type DOUBLE
PRECISION[].
point_column_name
TEXT. Name of the column with training data points.
{code}
and
{code}
test_source
TEXT. Name of the table containing the test data points. Testing data points
are expected to be stored row-wise in a column of type DOUBLE PRECISION[].
test_column_name
TEXT. Name of the column with testing data points.
{code}
> Support expressions for column names in k-NN
> --------------------------------------------
>
> Key: MADLIB-1060
> URL: https://issues.apache.org/jira/browse/MADLIB-1060
> Project: Apache MADlib
> Issue Type: Improvement
> Components: k-NN
> Reporter: Frank McQuillan
> Assignee: Himanshu Pandey
> Priority: Minor
> Labels: starter
> Fix For: v2.0
>
>
> Follow on to
> https://issues.apache.org/jira/browse/MADLIB-927
> {code}
> knn( point_source,
> point_column_name,
> label_column_name,
> test_source,
> test_column_name,
> id_column_name,
> output_table,
> operation,
> k
> )
> {code}
> Possible improvements:
> 1) The parameters 'point_column_name' and 'test_column_name' should support
> regular PostgreSQL expressions.
> 2) Should we infer 'c' or 'r' from the data types, rather than have to say
> explicitly?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)