tuhaihe commented on PR #627:
URL: https://github.com/apache/madlib/pull/627#issuecomment-3881896614
Hi @zhangyue1818 thanks for your contribution. But I tested this PR in
Cloudberry 2.0 and the coming Cloudberry 2.1 release, one test case failed:
```
[gpadmin@cdw build]$ ./src/bin/madpack -p cloudberry -c
gpadmin@localhost:7000/postgres install-check
madpack.py: INFO : Detected Apache Cloudberry version 2.0.0.
TEST CASE RESULT|Module: array_ops|array_ops.ic.sql_in|PASS|Time: 74
milliseconds
TEST CASE RESULT|Module: bayes|bayes.ic.sql_in|PASS|Time: 320 milliseconds
TEST CASE RESULT|Module: crf|crf_test_small.ic.sql_in|PASS|Time: 285
milliseconds
TEST CASE RESULT|Module: crf|crf_train_small.ic.sql_in|PASS|Time: 285
milliseconds
TEST CASE RESULT|Module: elastic_net|elastic_net.ic.sql_in|PASS|Time: 190
milliseconds
TEST CASE RESULT|Module: linalg|svd.ic.sql_in|PASS|Time: 572 milliseconds
TEST CASE RESULT|Module: linalg|matrix_ops.ic.sql_in|PASS|Time: 822
milliseconds
TEST CASE RESULT|Module: linalg|linalg.ic.sql_in|PASS|Time: 76 milliseconds
TEST CASE RESULT|Module: pmml|pmml.ic.sql_in|PASS|Time: 452 milliseconds
TEST CASE RESULT|Module: prob|prob.ic.sql_in|PASS|Time: 28 milliseconds
TEST CASE RESULT|Module: svm|svm.ic.sql_in|PASS|Time: 315 milliseconds
TEST CASE RESULT|Module: tsa|arima.ic.sql_in|PASS|Time: 1074 milliseconds
TEST CASE RESULT|Module: stemmer|porter_stemmer.ic.sql_in|PASS|Time: 34
milliseconds
TEST CASE RESULT|Module: conjugate_gradient|conj_grad.ic.sql_in|PASS|Time:
142 milliseconds
TEST CASE RESULT|Module: knn|knn.ic.sql_in|PASS|Time: 175 milliseconds
TEST CASE RESULT|Module: lda|lda.ic.sql_in|PASS|Time: 246 milliseconds
TEST CASE RESULT|Module: stats|correlation.ic.sql_in|PASS|Time: 182
milliseconds
TEST CASE RESULT|Module: stats|mw_test.ic.sql_in|PASS|Time: 42 milliseconds
TEST CASE RESULT|Module: stats|pred_metrics.ic.sql_in|PASS|Time: 255
milliseconds
TEST CASE RESULT|Module: stats|chi2_test.ic.sql_in|PASS|Time: 37 milliseconds
TEST CASE RESULT|Module: stats|anova_test.ic.sql_in|PASS|Time: 47
milliseconds
TEST CASE RESULT|Module: stats|t_test.ic.sql_in|PASS|Time: 42 milliseconds
TEST CASE RESULT|Module: stats|cox_prop_hazards.ic.sql_in|PASS|Time: 211
milliseconds
TEST CASE RESULT|Module: stats|ks_test.ic.sql_in|PASS|Time: 84 milliseconds
TEST CASE RESULT|Module:
stats|robust_and_clustered_variance_coxph.ic.sql_in|PASS|Time: 355 milliseconds
TEST CASE RESULT|Module: stats|wsr_test.ic.sql_in|PASS|Time: 46 milliseconds
TEST CASE RESULT|Module: stats|f_test.ic.sql_in|PASS|Time: 38 milliseconds
TEST CASE RESULT|Module: utilities|utilities.ic.sql_in|PASS|Time: 115
milliseconds
TEST CASE RESULT|Module: utilities|pivot.ic.sql_in|PASS|Time: 119
milliseconds
TEST CASE RESULT|Module: utilities|path.ic.sql_in|PASS|Time: 159 milliseconds
TEST CASE RESULT|Module: utilities|transform_vec_cols.ic.sql_in|PASS|Time:
156 milliseconds
TEST CASE RESULT|Module: utilities|text_utilities.ic.sql_in|PASS|Time: 126
milliseconds
TEST CASE RESULT|Module: utilities|sessionize.ic.sql_in|PASS|Time: 105
milliseconds
TEST CASE RESULT|Module: utilities|encode_categorical.ic.sql_in|PASS|Time:
186 milliseconds
TEST CASE RESULT|Module:
utilities|minibatch_preprocessing.ic.sql_in|PASS|Time: 186 milliseconds
TEST CASE RESULT|Module: assoc_rules|assoc_rules.ic.sql_in|FAIL|Time: 568
milliseconds
madpack.py: ERROR : Failed executing
/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp
madpack.py: ERROR : Check the log at
/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.log
TEST CASE RESULT|Module: convex|lmf.ic.sql_in|PASS|Time: 297 milliseconds
TEST CASE RESULT|Module: convex|mlp.ic.sql_in|PASS|Time: 507 milliseconds
TEST CASE RESULT|Module:
deep_learning|keras_model_arch_table.ic.sql_in|PASS|Time: 149 milliseconds
TEST CASE RESULT|Module: glm|glm.ic.sql_in|PASS|Time: 906 milliseconds
TEST CASE RESULT|Module: graph|graph.ic.sql_in|PASS|Time: 1343 milliseconds
TEST CASE RESULT|Module:
linear_systems|sparse_linear_sytems.ic.sql_in|PASS|Time: 132 milliseconds
TEST CASE RESULT|Module:
linear_systems|dense_linear_sytems.ic.sql_in|PASS|Time: 125 milliseconds
TEST CASE RESULT|Module:
recursive_partitioning|decision_tree.ic.sql_in|PASS|Time: 252 milliseconds
TEST CASE RESULT|Module:
recursive_partitioning|random_forest.ic.sql_in|PASS|Time: 322 milliseconds
TEST CASE RESULT|Module: regress|robust.ic.sql_in|PASS|Time: 193 milliseconds
TEST CASE RESULT|Module: regress|logistic.ic.sql_in|PASS|Time: 249
milliseconds
TEST CASE RESULT|Module: regress|linear.ic.sql_in|PASS|Time: 31 milliseconds
TEST CASE RESULT|Module: regress|clustered.ic.sql_in|PASS|Time: 189
milliseconds
TEST CASE RESULT|Module: regress|multilogistic.ic.sql_in|PASS|Time: 323
milliseconds
TEST CASE RESULT|Module: regress|marginal.ic.sql_in|PASS|Time: 457
milliseconds
TEST CASE RESULT|Module: sample|balance_sample.ic.sql_in|PASS|Time: 139
milliseconds
TEST CASE RESULT|Module: sample|train_test_split.ic.sql_in|PASS|Time: 166
milliseconds
TEST CASE RESULT|Module: sample|sample.ic.sql_in|PASS|Time: 20 milliseconds
TEST CASE RESULT|Module: sample|stratified_sample.ic.sql_in|PASS|Time: 112
milliseconds
TEST CASE RESULT|Module: summary|summary.ic.sql_in|PASS|Time: 148
milliseconds
TEST CASE RESULT|Module: kmeans|kmeans.ic.sql_in|PASS|Time: 661 milliseconds
TEST CASE RESULT|Module: pca|pca.ic.sql_in|PASS|Time: 1475 milliseconds
TEST CASE RESULT|Module: pca|pca_project.ic.sql_in|PASS|Time: 528
milliseconds
TEST CASE RESULT|Module: validation|cross_validation.ic.sql_in|PASS|Time:
332 milliseconds
INFO: Log files saved in /tmp/madlib.7qnxdkya
```
```
[gpadmin@cdw build]$ cat
/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.log
-- Switch to test user:
SET ROLE "madlib_210_installcheck_postgres";
SET
-- Set SEARCH_PATH for install-check:
SET search_path=madlib_installcheck_assoc_rules,madlib;
SET
/* -----------------------------------------------------------------------
*//**
*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*
*//*
----------------------------------------------------------------------- */
---------------------------------------------------------------------------
-- Rules:
-- ------
-- 1) Any DB objects should be created w/o schema prefix,
-- since this file is executed in a separate schema context.
-- 2) There should be no DROP statements in this script, since
-- all objects created in the default schema will be cleaned-up outside.
---------------------------------------------------------------------------
---------------------------------------------------------------------------
-- Setup:
---------------------------------------------------------------------------
CREATE OR REPLACE FUNCTION assoc_array_eq
(
arr1 TEXT[],
arr2 TEXT[]
)
RETURNS BOOL AS $$
SELECT COUNT(*) = array_upper($1, 1) AND array_upper($1, 1) =
array_upper($2, 1)
FROM (SELECT unnest($1) id) t1, (SELECT unnest($2) id) t2
WHERE t1.id = t2.id;
$$ LANGUAGE sql IMMUTABLE;
CREATE FUNCTION
CREATE OR REPLACE FUNCTION install_test() RETURNS VOID AS $$
declare
result1 TEXT;
result2 TEXT;
result3 TEXT;
result_maxiter TEXT;
res madlib.assoc_rules_results;
output_schema TEXT;
output_table TEXT;
total_rules INT;
total_time INTERVAL;
begin
DROP TABLE IF EXISTS test_data1;
CREATE TABLE test_data1 (
trans_id INT
, product INT
);
DROP TABLE IF EXISTS test_data2;
CREATE TABLE test_data2 (
trans_id INT
, product VARCHAR
);
INSERT INTO test_data1 VALUES (1,1);
INSERT INTO test_data1 VALUES (1,2);
INSERT INTO test_data1 VALUES (3,3);
INSERT INTO test_data1 VALUES (8,4);
INSERT INTO test_data1 VALUES (10,1);
INSERT INTO test_data1 VALUES (10,2);
INSERT INTO test_data1 VALUES (10,3);
INSERT INTO test_data1 VALUES (19,2);
INSERT INTO test_data2 VALUES (1, 'beer');
INSERT INTO test_data2 VALUES (1, 'diapers');
INSERT INTO test_data2 VALUES (1, 'chips');
INSERT INTO test_data2 VALUES (2, 'beer');
INSERT INTO test_data2 VALUES (2, 'diapers');
INSERT INTO test_data2 VALUES (3, 'beer');
INSERT INTO test_data2 VALUES (3, 'diapers');
INSERT INTO test_data2 VALUES (4, 'beer');
INSERT INTO test_data2 VALUES (4, 'chips');
INSERT INTO test_data2 VALUES (5, 'beer');
INSERT INTO test_data2 VALUES (6, 'beer');
INSERT INTO test_data2 VALUES (6, 'diapers');
INSERT INTO test_data2 VALUES (6, 'chips');
INSERT INTO test_data2 VALUES (7, 'beer');
INSERT INTO test_data2 VALUES (7, 'diapers');
DROP TABLE IF EXISTS test1_exp_result;
CREATE TABLE test1_exp_result (
ruleid integer,
pre text[],
post text[],
support double precision,
confidence double precision,
lift double precision,
conviction double precision
) ;
DROP TABLE IF EXISTS test2_exp_result;
CREATE TABLE test2_exp_result (
ruleid integer,
pre text[],
post text[],
support double precision,
confidence double precision,
lift double precision,
conviction double precision
) ;
INSERT INTO test1_exp_result VALUES (7, '{3}', '{1}',
0.20000000000000001, 0.5, 1.2499999999999998, 1.2);
INSERT INTO test1_exp_result VALUES (4, '{2}', '{1}',
0.40000000000000002, 0.66666666666666674, 1.6666666666666667,
1.8000000000000003);
INSERT INTO test1_exp_result VALUES (1, '{1}', '{2,3}',
0.20000000000000001, 0.5, 2.4999999999999996, 1.6000000000000001);
INSERT INTO test1_exp_result VALUES (9, '{2,3}', '{1}',
0.20000000000000001, 1, 2.4999999999999996, 0);
INSERT INTO test1_exp_result VALUES (6, '{1,2}', '{3}',
0.20000000000000001, 0.5, 1.2499999999999998, 1.2);
INSERT INTO test1_exp_result VALUES (8, '{3}', '{2}',
0.20000000000000001, 0.5, 0.83333333333333337, 0.80000000000000004);
INSERT INTO test1_exp_result VALUES (5, '{1}', '{2}',
0.40000000000000002, 1, 1.6666666666666667, 0);
INSERT INTO test1_exp_result VALUES (2, '{3}', '{2,1}',
0.20000000000000001, 0.5, 1.2499999999999998, 1.2);
INSERT INTO test1_exp_result VALUES (10, '{3,1}', '{2}',
0.20000000000000001, 1, 1.6666666666666667, 0);
INSERT INTO test1_exp_result VALUES (3, '{1}', '{3}',
0.20000000000000001, 0.5, 1.2499999999999998, 1.2);
INSERT INTO test2_exp_result VALUES (7, '{chips,diapers}', '{beer}',
0.2857142857142857, 1, 1, 0);
INSERT INTO test2_exp_result VALUES (2, '{chips}', '{diapers}',
0.2857142857142857, 0.66666666666666663, 0.93333333333333324,
0.85714285714285698);
INSERT INTO test2_exp_result VALUES (1, '{chips}', '{diapers,beer}',
0.2857142857142857, 0.66666666666666663, 0.93333333333333324,
0.85714285714285698);
INSERT INTO test2_exp_result VALUES (6, '{diapers}', '{beer}',
0.7142857142857143, 1, 1, 0);
INSERT INTO test2_exp_result VALUES (4, '{beer}', '{diapers}',
0.7142857142857143, 0.7142857142857143, 1, 1);
INSERT INTO test2_exp_result VALUES (3, '{chips,beer}', '{diapers}',
0.2857142857142857, 0.66666666666666663, 0.93333333333333324,
0.85714285714285698);
INSERT INTO test2_exp_result VALUES (5, '{chips}', '{beer}',
0.42857142857142855, 1, 1, 0);
res = madlib.assoc_rules (.1, .5, 'trans_id', 'product',
'test_data1','madlib_installcheck_assoc_rules', false);
RETURN;
end $$ language plpgsql;
CREATE FUNCTION
---------------------------------------------------------------------------
-- Test
---------------------------------------------------------------------------
SELECT install_test();
psql:/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp:154: NOTICE:
table "test_data1" does not exist, skipping
psql:/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp:154: NOTICE:
Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'trans_id' as
the Apache Cloudberry data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make
sure column(s) chosen are the optimal data distribution key to minimize skew.
psql:/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp:154: NOTICE:
table "test_data2" does not exist, skipping
psql:/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp:154: NOTICE:
Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'trans_id' as
the Apache Cloudberry data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make
sure column(s) chosen are the optimal data distribution key to minimize skew.
psql:/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp:154: NOTICE:
table "test1_exp_result" does not exist, skipping
psql:/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp:154: NOTICE:
Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'ruleid' as
the Apache Cloudberry data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make
sure column(s) chosen are the optimal data distribution key to minimize skew.
psql:/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp:154: NOTICE:
table "test2_exp_result" does not exist, skipping
psql:/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp:154: NOTICE:
Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'ruleid' as
the Apache Cloudberry data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make
sure column(s) chosen are the optimal data distribution key to minimize skew.
psql:/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp:154:
WARNING: terminating connection because of crash of another server process
(seg0 slice3 172.17.0.6:7002 pid=45213)
DETAIL: The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited abnormally
and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
psql:/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp:154:
WARNING: terminating connection because of crash of another server process
(seg0 slice1 172.17.0.6:7002 pid=45202)
DETAIL: The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited abnormally
and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
psql:/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp:154:
WARNING: terminating connection because of crash of another server process
(seg0 172.17.0.6:7002 pid=45137)
DETAIL: The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited abnormally
and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
psql:/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp:154:
WARNING: writer gang of current global transaction is lost
psql:/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp:154:
WARNING: Any temporary tables for this session have been dropped because the
gang was disconnected (session id = 596)
psql:/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp:154: ERROR:
DTX RollbackAndReleaseCurrentSubTransaction dispatch failed
CONTEXT: PL/Python function "assoc_rules"
PL/pgSQL function install_test() line 93 at assignment
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]