[
https://issues.apache.org/jira/browse/MADLIB-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15492940#comment-15492940
]
medart commented on MADLIB-1016:
--------------------------------
Hello,
Sorry for this late response.
Here is what u ask for:
Select version () show this output:
MADlib version: 1.9, git revision: rc/v1.9-rc1, cmake configuration time: Thu
Apr 7 18:43:03 UTC 2016, build type: Release, build system:
Linux-2.6.18-238.27.1.el5.hotfix.bz516490, C comp
iler: gcc 4.4.0, C++ compiler: g++ 4.4.0
And the error when we use tree decision :
CONTEXT: PL/Python function "tree_train"
ERROR: plpy.SPIError: ORDER BY specified, but array_agg is not an ordered
aggregate function (plpython.c:4648) LINE 1: SELECT
array_agg(quote_ident(attname)::varchar
^
QUERY: SELECT array_agg(quote_ident(attname)::varchar
ORDER BY attnum) AS cols
FROM pg_attribute
WHERE attrelid = 'dt_golf'::regclass
AND NOT attisdropped
AND attnum > 0
CONTEXT: Traceback (most recent call last):
PL/Python function "tree_train", line 28, in <module>
surrogate_params, verbose_mode)
PL/Python function "tree_train", line 580, in tree_train
PL/Python function "tree_train", line 64, in _tree_validate_args
PL/Python function "tree_train", line 383, in columns_exist_in_table
PL/Python function "tree_train", line 300, in get_cols PL/Python function
"tree_train"
The same in k-mean :
ERROR: plpy.SPIError: ORDER BY specified, but array_agg is not an ordered
aggregate function (plpython.c:4648)
LINE 11: array_agg(_new_centroid_id ORDER BY ...
^
QUERY:
INSERT INTO pg_temp._madlib_kmeans_state
SELECT
1,
(
SELECT
CAST((
madlib.matrix_agg(
_centroid::FLOAT8[]
ORDER BY _new_centroid_id),
array_agg(_new_centroid_id ORDER BY _new_centroid_id),
sum(_objective_fn),
CAST(sum(_num_reassigned) AS DOUBLE PRECISION)
/ sum(_num_points)
) AS madlib.kmeans_state)
FROM (
SELECT
(_new_centroid).column_id AS _new_centroid_id,
sum((_new_centroid).distance) AS _objective_fn,
count(*) AS _num_points,
sum(
CAST(
coalesce(
(CAST(
(SELECT (_state).old_centroid_ids
FROM pg_temp._madlib_kmeans_state as rel_state
WHERE _iteration = 0) AS INTEGER[]
))[(_new_centroid).column_id + 1] !=
_old_centroid_id,
TRUE
)
AS INTEGER
)
) AS _num_reassigned,
madlib.avg(_point::FLOAT8[]) AS _centroid
FROM (
SELECT
-- PostgreSQL/Greenplum tuning:
-- VOLATILE function as optimization fence
madlib.noop(),
_src.points AS _point,
madlib._closest_column(
(SELECT (_state).centroids
FROM pg_temp._madlib_kmeans_state as rel_state
WHERE _iteration = 0)
, _src.points::FLOAT8[]
, 'squared_dist_norm2'
, 'madlib.squared_dist_norm2'
)
AS _new_centroid,
(madlib._closest_column((SELECT (_state).centroids
FROM pg_temp._madlib_kmeans_state as
rel_state
WHERE _iteration = 0 - 1
)
, _src.points::FLOAT8[]
, 'squared_dist_norm2'
, 'madlib.squared_dist_norm2'
)
).column_id
AS _old_centroid_id
FROM km_sample AS _src
WHERE abs(coalesce(madlib.svec_elsum(points),
'Infinity'::FLOAT8)) < 'Infinity'::FLOAT8
AND NOT
madlib.array_contains_null(_src.points::FLOAT8[])
) AS _points_with_assignments
GROUP BY (_new_centroid).column_id
) AS _new_centroids
)
CONTEXT: Traceback (most recent call last):
PL/Python function "internal_compute_kmeans", line 22, in <module>
return kmeans.compute_kmeans(**globals())
PL/Python function "internal_compute_kmeans", line 332, in compute_kmeans
PL/Python function "internal_compute_kmeans", line 236, in update
PL/Python function "internal_compute_kmeans", line 101, in runSQL PL/Python
function "internal_compute_kmeans"
SQL statement "SELECT madlib.internal_compute_kmeans( '_madlib_kmeans_args',
'_madlib_kmeans_state', textin(regclassout( $1 )), $2 , textin(regprocout( $3
)))"
PL/pgSQL function "kmeans" line 103 at assignment SQL statement "SELECT
madlib.kmeans( $1 , $2 , madlib.kmeanspp_seeding( $1 , $2 , $3 , $4 ,
NULL, $5 ), $4 , $6 , $7 , $8 )"
PL/pgSQL function "kmeanspp" line 4 at assignment SQL statement "SELECT
madlib.kmeanspp( $1 , $2 , $3 , $4 , $5 , $6 , $7 , 1.0::DOUBLE
PRECISION)"
PL/pgSQL function "kmeanspp" line 4 at assignment
whish thats help.
> k-mean & decision Tree don't work because of array_agg in order by clause
> -------------------------------------------------------------------------
>
> Key: MADLIB-1016
> URL: https://issues.apache.org/jira/browse/MADLIB-1016
> Project: Apache MADlib
> Issue Type: Bug
> Components: Module: Array Ops
> Reporter: medart
>
> When we try to use k-mean function or decision tree the same error appear.
> ERROR: plpy.SPIError: ORDER BY specified, but array_agg is not an ordered
> aggregate function (plpython.c:4648)
> For information we use Madlib 1.9.1 with Pivotal Greenplum, and we use the
> example data as shown in the official documentatio.
> The regression work great nad we didint try all the algorithme yet, to see if
> the problem apprear in the other methods.
> Thank you for your help
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)