[GitHub] [madlib] fmcquillan99 commented on issue #433: Kmeans: Add automatic optimal cluster estimation

GitBox Mon, 09 Sep 2019 10:57:52 -0700

fmcquillan99 commented on issue #433: Kmeans: Add automatic optimal cluster 
estimation
URL: https://github.com/apache/madlib/pull/433#issuecomment-529595386
 
 
   minor verbosity thing
   ```
   madlib=# SELECT madlib.kmeanspp_auto(
   madlib(# 'km_sample',
   madlib(# 'k_auto1',
   madlib(# 'points', 
   madlib(# ARRAY[2,3],
   madlib(# 'madlib.squared_dist_norm2',
   madlib(# 'madlib.avg', 
   madlib(# 20, 
   madlib(# 0.001,
   madlib(# 1.0,
   madlib(# 'elbow'
   madlib(# );
   NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 
'k' as the Greenplum Database data distribution key for this table.
   HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make 
sure column(s) chosen are the optimal data distribution key to minimize skew.
   CONTEXT:  SQL statement "
           CREATE TABLE k_auto1 (
               k INTEGER,
               centroids   DOUBLE PRECISION[][],
               cluster_variance    DOUBLE PRECISION[],
               objective_fn    DOUBLE PRECISION,
               frac_reassigned DOUBLE PRECISION,
               num_iterations  INTEGER
               
               , elbow DOUBLE PRECISION)
           "
   PL/Python function "kmeanspp_auto"
    kmeanspp_auto 
   ---------------
    
   (1 row)
   
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [madlib] fmcquillan99 commented on issue #433: Kmeans: Add automatic optimal cluster estimation

Reply via email to