[ 
https://issues.apache.org/jira/browse/MADLIB-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15492940#comment-15492940
 ] 

medart commented on MADLIB-1016:
--------------------------------

Hello,

Sorry for this late response.
Here is what u ask for:

Select version () show this output:

MADlib version: 1.9, git revision: rc/v1.9-rc1, cmake configuration time: Thu 
Apr  7 18:43:03 UTC 2016, build type: Release, build system: 
Linux-2.6.18-238.27.1.el5.hotfix.bz516490, C comp
iler: gcc 4.4.0, C++ compiler: g++ 4.4.0


And the error when we use tree decision :

CONTEXT:  PL/Python function "tree_train"
ERROR:  plpy.SPIError: ORDER BY specified, but array_agg is not an ordered 
aggregate function (plpython.c:4648) LINE 1: SELECT 
array_agg(quote_ident(attname)::varchar
               ^
QUERY:  SELECT array_agg(quote_ident(attname)::varchar
                                     ORDER BY attnum) AS cols
                    FROM pg_attribute
                    WHERE attrelid = 'dt_golf'::regclass
                      AND NOT attisdropped
                      AND attnum > 0
CONTEXT:  Traceback (most recent call last):
  PL/Python function "tree_train", line 28, in <module>
    surrogate_params, verbose_mode)
  PL/Python function "tree_train", line 580, in tree_train
  PL/Python function "tree_train", line 64, in _tree_validate_args
  PL/Python function "tree_train", line 383, in columns_exist_in_table
  PL/Python function "tree_train", line 300, in get_cols PL/Python function 
"tree_train"

The same in k-mean :

ERROR:  plpy.SPIError: ORDER BY specified, but array_agg is not an ordered 
aggregate function (plpython.c:4648)
LINE 11:                         array_agg(_new_centroid_id ORDER BY ...
                                 ^
QUERY:
                INSERT INTO pg_temp._madlib_kmeans_state
                SELECT
                    1,
                    (
                SELECT
                    CAST((
                        madlib.matrix_agg(
                            _centroid::FLOAT8[]
                            ORDER BY _new_centroid_id),
                        array_agg(_new_centroid_id ORDER BY _new_centroid_id),
                        sum(_objective_fn),
                        CAST(sum(_num_reassigned) AS DOUBLE PRECISION)
                            / sum(_num_points)
                    ) AS madlib.kmeans_state)
                FROM (
                    SELECT
                        (_new_centroid).column_id AS _new_centroid_id,
                        sum((_new_centroid).distance) AS _objective_fn,
                        count(*) AS _num_points,
                        sum(
                            CAST(
                                coalesce(
                                    (CAST(
                                        (SELECT (_state).old_centroid_ids
                                  FROM pg_temp._madlib_kmeans_state as rel_state
                                  WHERE _iteration = 0) AS INTEGER[]
                                    ))[(_new_centroid).column_id + 1] != 
_old_centroid_id,
                                    TRUE
                                )
                                AS INTEGER
                            )
                        ) AS _num_reassigned,
                        madlib.avg(_point::FLOAT8[]) AS _centroid
                    FROM (
                        SELECT
                            -- PostgreSQL/Greenplum tuning:
                            -- VOLATILE function as optimization fence
                           madlib.noop(),
                            _src.points AS _point,
                            madlib._closest_column(
                                (SELECT (_state).centroids
                              FROM pg_temp._madlib_kmeans_state as rel_state
                              WHERE _iteration = 0)
                                , _src.points::FLOAT8[]
                                , 'squared_dist_norm2'
                                , 'madlib.squared_dist_norm2'
                                )
                            AS _new_centroid,
                            (madlib._closest_column((SELECT (_state).centroids
                                   FROM pg_temp._madlib_kmeans_state as 
rel_state
                                   WHERE _iteration = 0 - 1
                                )
                                    , _src.points::FLOAT8[]
                                    , 'squared_dist_norm2'
                                    , 'madlib.squared_dist_norm2'
                                    )
                                ).column_id
                             AS _old_centroid_id
                        FROM km_sample AS _src
                        WHERE abs(coalesce(madlib.svec_elsum(points), 
'Infinity'::FLOAT8)) < 'Infinity'::FLOAT8
                        AND NOT 
madlib.array_contains_null(_src.points::FLOAT8[])
                    ) AS _points_with_assignments
                    GROUP BY (_new_centroid).column_id
                ) AS _new_centroids
                )

CONTEXT:  Traceback (most recent call last):
  PL/Python function "internal_compute_kmeans", line 22, in <module>
    return kmeans.compute_kmeans(**globals())
  PL/Python function "internal_compute_kmeans", line 332, in compute_kmeans
  PL/Python function "internal_compute_kmeans", line 236, in update
  PL/Python function "internal_compute_kmeans", line 101, in runSQL PL/Python 
function "internal_compute_kmeans"
SQL statement "SELECT  madlib.internal_compute_kmeans( '_madlib_kmeans_args', 
'_madlib_kmeans_state', textin(regclassout( $1 )),  $2 , textin(regprocout( $3 
)))"
PL/pgSQL function "kmeans" line 103 at assignment SQL statement "SELECT  
madlib.kmeans(  $1 ,  $2 , madlib.kmeanspp_seeding( $1 ,  $2 ,  $3 ,  $4 , 
NULL,  $5 ),  $4 ,  $6 ,  $7 ,  $8 )"
PL/pgSQL function "kmeanspp" line 4 at assignment SQL statement "SELECT  
madlib.kmeanspp( $1 ,  $2 ,  $3 ,  $4 ,  $5 ,  $6 ,  $7 , 1.0::DOUBLE 
PRECISION)"
PL/pgSQL function "kmeanspp" line 4 at assignment


whish thats help.


> k-mean & decision Tree don't work because of array_agg in order by clause
> -------------------------------------------------------------------------
>
>                 Key: MADLIB-1016
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1016
>             Project: Apache MADlib
>          Issue Type: Bug
>          Components: Module: Array Ops
>            Reporter: medart
>
> When we try to use k-mean function or decision tree the same error appear.
> ERROR:  plpy.SPIError: ORDER BY specified, but array_agg is not an ordered 
> aggregate function (plpython.c:4648)
> For information we use Madlib 1.9.1 with Pivotal Greenplum, and we use the 
> example data as shown in the official documentatio.
> The regression work great nad we didint try all the algorithme yet, to see if 
> the problem apprear in the other methods.
> Thank you for your help



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to