David Chen created MADLIB-1136:
----------------------------------
Summary: Getting "ERROR: plpy.SPIError: Function" when calling
linregr_train function with big data
Key: MADLIB-1136
URL: https://issues.apache.org/jira/browse/MADLIB-1136
Project: Apache MADlib
Issue Type: Bug
Components: Module: Linear Regression
Reporter: David Chen
hi MADLib developers,
we have been trying to use MADlib on Greenplum to in-database perform linear
regression calculation on a large amount of data (789,626,243 rows of data,
segmented in ~475,000 groups). However, after running the following SQL
statement for a little bit more than ten minutes, the following error message
occurs:
SQL statement:
SELECT madlib.linregr_train(
'xinos_plus_case_dlinterference_v2.temp_neighbor_pair_cqi_prb_nonull',
'xinos_plus_case_dlinterference_v2.taipei_lm_result_temp',
'average_cqi', 'array[1, prb_utilization]',
'main_lnbts_id,main_lncel_id,lnbts_id,lncel_id');
Error message:
ERROR: plpy.SPIError: Function
"madlib.linregr_merge_states(madlib.bytea8,madlib.bytea8)": ByteString
improperly aligned for alignment request in seek(). (UDF_impl.hpp:210) (seg2
59-120-199-107.HINET-IP.hinet.net:50002 pid=9137) (plpython.c:4648)
If we downsize the input data to 269837688 rows, then the same SQL statement
can run with successful result.
We are not sure if what we encountered here is a bug or an issue with how we
use this MADLib linear regression function and we will appreciate it a lot if
you could give us some pointers.
We are willing to provide more information about input data (e.g. data schema)
for further investigation if needed.
thank you very much for taking care of this issue.
David
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)