Github user njayaram2 commented on a diff in the pull request: https://github.com/apache/madlib/pull/315#discussion_r214485090 --- Diff: src/ports/postgres/modules/knn/knn.py_in --- @@ -53,22 +55,12 @@ def knn_validate_src(schema_madlib, point_source, point_column_name, point_id, if label_column_name and label_column_name.strip(): cols_in_tbl_valid(point_source, [label_column_name], 'kNN') - cols_in_tbl_valid(point_source, (point_column_name, point_id), 'kNN') - cols_in_tbl_valid(test_source, (test_column_name, test_id), 'kNN') - - if not is_col_array(point_source, point_column_name): - plpy.error("kNN Error: Feature column '{0}' in train table is not" - " an array.".format(point_column_name)) - if not is_col_array(test_source, test_column_name): - plpy.error("kNN Error: Feature column '{0}' in test table is not" - " an array.".format(test_column_name)) --- End diff -- `point_column_name` and `test_column_name` params must be an array as this if check suggests. If it's not an array it fails further down when the distance function (such as `squared_dist_norm2`) is called. I don't think the function `is_var_valid()` checks for these being arrays. You may have to check them after the new asserts, using a new helper function (`is_col_array()` cannot be used as is for expressions, and `is_var_valid()` does not check for an array, but just the validity of the expression)
---