Rahul Iyer created MADLIB-1257:
----------------------------------
Summary: PostgreSQL crashed during random forest training
Key: MADLIB-1257
URL: https://issues.apache.org/jira/browse/MADLIB-1257
Project: Apache MADlib
Issue Type: Bug
Components: Module: Random Forest
Reporter: Rahul Iyer
*Bug reported by Luyao Chen*
I got a problem when training the grouped data with random forest(300
features). Small data was fine ( eg, 56K instances in 56 groups), but failed
for 240K instances in 250 groups. Postgres forced to disconnect the session
after showing the below message in verbose mode:
{code:sql}
NOTICE: view "__madlib_temp_60124179_1532371657_7130296__" will be a temporary
view
NOTICE: sql_create_empty_result_table:
CREATE TABLE analysis.dx_rf_train_output_1 (
gid integer,
sample_id integer,
tree madlib.bytea8);
NOTICE: sql_refresh_training_pois_cnt:
TRUNCATE TABLE
__madlib_temp_91155016_1532371657_5660955__ CASCADE;
INSERT INTO
__madlib_temp_91155016_1532371657_5660955__
SELECT
*,
madlib.poisson_random(1) AS poisson_count
FROM
(
SELECT
*,
0.::double precision AS
__madlib_temp_14328459_1532371657_7318497__
FROM analysis.dxpredict_svec
) subq
WHERE __madlib_temp_14328459_1532371657_7318497__ <
1
NOTICE:
src_cnt: 158360,
oob_cnt: 92418,
dup_cnt: 250617.
NOTICE: Started tree building for all groups
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
The PostgreSQL did not capture the detail log even I increased the logstatement
to "all"
2018-07-23 14:47:50.229 EDT [1090] LOG: server process (PID 1980) was
terminated by signal 11: Segmentation fault
2018-07-23 14:47:50.229 EDT [1090] DETAIL: Failed process was running: SELECT
madlib.forest_train('analysis.dxpredict_svec',
'analysis.dx_rf_train_output_1',
'rowid',
'positive',
'*',
'rowid,positive,case_icd',
'case_icd',
30::integer,
30::integer,
TRUE::boolean,
1::integer,
10::integer,
3::integer,
1::integer,
10::integer,
NULL,
TRUE
);
2018-07-23 14:47:50.229 EDT [1090] LOG: terminating any other active server
processes
2018-07-23 14:47:50.229 EDT [1401] WARNING: terminating connection because of
crash of another server process
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)