[ https://issues.apache.org/jira/browse/MADLIB-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Frank McQuillan updated MADLIB-1380: ------------------------------------ Description: {code} kmeans_random( rel_source, expr_point, k, -- can be a single value like now or an array of k values fn_dist, -- optional agg_centroid, -- optional max_num_iterations, -- optional min_frac_reassigned, -- optional k_selection_algorithm -- optional (only applies if 'k' parameter is an array with multiple k values) ) kmeanspp( rel_source, expr_point, k, -- can be a single value like now or an array of k values fn_dist, -- optional agg_centroid, -- optional max_num_iterations, -- optional min_frac_reassigned, -- optional seeding_sample_ratio, -- optional k_selection_algorithm -- optional (only applies if 'k' parameter is an array with multiple k values) ) k INTEGER of INTEGER[]. The number of centroids to calculate. Can be a single value or an array of k values to explore. If array of k values given, the parameter 'k_selection_algorithm' determines the evaluation method. k_selection_algorithm (optional) TEXT, default: 'elbow'. Method to evaluate number of centroids k. Only applies if the parameter 'k' is an array with multiple k values. Currently two approaches are supported: 'elbow', and 'silhouette'. The text can be any subset of the strings; for e.g., 'silh' will use the silhouette method. {code} e.g., {code} SELECT * FROM madlib.kmeanspp ( 'km_sample', -- rel_source 'points', -- expr_point 'ARRAY[2, 4, 6, 8, 10]', -- k 'madlib.squared_dist_norm2', -- fn_dist 'madlib.avg', -- agg_centroid 20, -- max_num_iterations 0.001, -- min_frac_reassigned 'elbow' -- k_selection_algorithm ); {code} was: {code} kmeans_random( rel_source, expr_point, k, -- can be a single value like now or an array of k values fn_dist, -- optional agg_centroid, -- optional max_num_iterations, -- optional min_frac_reassigned, -- optional k_selection_algorithm -- optional (only applies if 'k' parameter is an array with multiple k values) ) kmeanspp( rel_source, expr_point, k, -- can be a single value like now or an array of k values fn_dist, -- optional agg_centroid, -- optional max_num_iterations, -- optional min_frac_reassigned, -- optional seeding_sample_ratio, -- optional k_selection_algorithm -- optional (only applies if 'k' parameter is an array with multiple k values) ) k INTEGER of INTEGER[]. The number of centroids to calculate. Can be a single value or an array of k values to explore. If array of k values given, the parameter 'k_selection_algorithm' determines the evaluation method. k_selection_algorithm (optional) TEXT, default: 'elbow'. Method to evaluate number of centroids k. Only applies if the parameter 'k' is an array with multiple k values. Currently two approaches are supported: 'elbow', and 'silhouette'. The text can be any subset of the strings; for e.g., 'silh' will use the silhouette method. {code} e.g., {code} SELECT * FROM madlib.kmeanspp ( 'km_sample', -- rel_source 'points', -- expr_point 'ARRAY[2, 4, 6, 8, 10]', -- k 'madlib.squared_dist_norm2', -- fn_dist 'madlib.avg', -- agg_centroid 20, -- max_num_iterations 0.001, -- min_frac_reassigned 'elbow' -- k_selection_algorithm ); {code} > Select number of centroids in k-means > ------------------------------------- > > Key: MADLIB-1380 > URL: https://issues.apache.org/jira/browse/MADLIB-1380 > Project: Apache MADlib > Issue Type: New Feature > Components: Module: k-Means Clustering > Reporter: Frank McQuillan > Priority: Major > Fix For: v1.17 > > > {code} > kmeans_random( rel_source, > expr_point, > k, -- can be a single value like > now or an array of k values > fn_dist, -- optional > agg_centroid, -- optional > max_num_iterations, -- optional > min_frac_reassigned, -- optional > k_selection_algorithm -- optional (only applies if 'k' > parameter is an array with multiple k values) > ) > kmeanspp( rel_source, > expr_point, > k, -- can be a single value like now or an > array of k values > fn_dist, -- optional > agg_centroid, -- optional > max_num_iterations, -- optional > min_frac_reassigned, -- optional > seeding_sample_ratio, -- optional > k_selection_algorithm -- optional (only applies if > 'k' parameter is an array with multiple k values) > ) > k > INTEGER of INTEGER[]. The number of centroids to calculate. Can be a single > value > or an array of k values to explore. If array of k values given, the > parameter 'k_selection_algorithm' > determines the evaluation method. > k_selection_algorithm (optional) > TEXT, default: 'elbow'. Method to evaluate number of centroids k. > Only applies if the parameter 'k' is an array with multiple k values. > Currently two approaches are supported: 'elbow', and 'silhouette'. > The text can be any subset of the strings; for e.g., 'silh' will use the > silhouette method. > {code} > e.g., > {code} > SELECT * FROM madlib.kmeanspp ( > 'km_sample', > -- rel_source > 'points', > -- expr_point > 'ARRAY[2, 4, 6, > 8, 10]', -- k > > 'madlib.squared_dist_norm2', -- fn_dist > 'madlib.avg', > -- agg_centroid > 20, > -- max_num_iterations > 0.001, > -- min_frac_reassigned > 'elbow' > -- k_selection_algorithm > ); > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)