fmcquillan99 commented on issue #352: Feature/kd tree knn URL: https://github.com/apache/madlib/pull/352#issuecomment-463797009 This returns a result: {code} DROP TABLE IF EXISTS knn_result_classification_kd; SELECT madlib.knn( 'knn_train_data', -- Table of training data 'data', -- Col name of training data 'id', -- Col name of id in train data NULL, -- Training labels 'knn_test_data', -- Table of test data 'data', -- Col name of test data 'id', -- Col name of id in test data 'knn_result_classification_kd', -- Output table 1, -- Number of nearest neighbors True, -- True to list nearest-neighbors by id 'madlib.squared_dist_norm2', -- Distance function False, -- For weighted average 'kd_tree', -- Use kd-tree 'depth=1, leaf_nodes=1' -- Kd-tree options ); SELECT * FROM knn_result_classification_kd ORDER BY id; {code} produces {code} id | data | k_nearest_neighbours ----+---------+---------------------- 1 | {2,1} | {2} 2 | {2,6} | {3} 3 | {15,40} | {7} 4 | {12,1} | {4} 5 | {2,90} | {9} 6 | {50,45} | {6} (6 rows) {code} though I have not checked if this result is correct. But if I search 31 of 32 leaf nodes I get no result set: {code} DROP TABLE IF EXISTS knn_result_classification_kd; SELECT madlib.knn( 'knn_train_data', -- Table of training data 'data', -- Col name of training data 'id', -- Col name of id in train data NULL, -- Training labels 'knn_test_data', -- Table of test data 'data', -- Col name of test data 'id', -- Col name of id in test data 'knn_result_classification_kd', -- Output table 1, -- Number of nearest neighbors True, -- True to list nearest-neighbors by id 'madlib.squared_dist_norm2', -- Distance function False, -- For weighted average 'kd_tree', -- Use kd-tree 'depth=5, leaf_nodes=31' -- Kd-tree options ); SELECT * FROM knn_result_classification_kd ORDER BY id; {code} produces {code} id | data | k_nearest_neighbours ----+------+---------------------- (0 rows) {code} which does not seem right. In fact after more testing, I can't get any results for a depth greater than 1: {code} DROP TABLE IF EXISTS knn_result_classification_kd; SELECT madlib.knn( 'knn_train_data', -- Table of training data 'data', -- Col name of training data 'id', -- Col name of id in train data NULL, -- Training labels 'knn_test_data', -- Table of test data 'data', -- Col name of test data 'id', -- Col name of id in test data 'knn_result_classification_kd', -- Output table 1, -- Number of nearest neighbors True, -- True to list nearest-neighbors by id 'madlib.squared_dist_norm2', -- Distance function False, -- For weighted average 'kd_tree', -- Use kd-tree 'depth=2, leaf_nodes=1' -- Kd-tree options ); SELECT * FROM knn_result_classification_kd ORDER BY id; {code} produces {code} id | data | k_nearest_neighbours ----+------+---------------------- (0 rows) {code}
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services