David Chen created MADLIB-1136:
----------------------------------

             Summary: Getting "ERROR: plpy.SPIError: Function" when calling 
linregr_train function with big data 
                 Key: MADLIB-1136
                 URL: https://issues.apache.org/jira/browse/MADLIB-1136
             Project: Apache MADlib
          Issue Type: Bug
          Components: Module: Linear Regression
            Reporter: David Chen


hi MADLib developers,

we have been trying to use MADlib on Greenplum to in-database perform linear 
regression calculation on a large amount of data (789,626,243 rows of data, 
segmented in ~475,000 groups). However, after running the following SQL 
statement for a little bit more than ten minutes, the following error message 
occurs:
SQL statement: 
SELECT madlib.linregr_train(
    'xinos_plus_case_dlinterference_v2.temp_neighbor_pair_cqi_prb_nonull',
    'xinos_plus_case_dlinterference_v2.taipei_lm_result_temp', 
    'average_cqi', 'array[1, prb_utilization]',
    'main_lnbts_id,main_lncel_id,lnbts_id,lncel_id');

Error message:
ERROR: plpy.SPIError: Function 
"madlib.linregr_merge_states(madlib.bytea8,madlib.bytea8)": ByteString 
improperly aligned for alignment request in seek(). (UDF_impl.hpp:210)  (seg2 
59-120-199-107.HINET-IP.hinet.net:50002 pid=9137) (plpython.c:4648)

If we downsize the input data to 269837688 rows, then the same SQL statement 
can run with successful result.

We are not sure if what we encountered here is a bug or an issue with how we 
use this MADLib linear regression function and we will appreciate it a lot if 
you could give us some pointers.

We are willing to provide more information about input data (e.g. data schema) 
for further investigation if needed.

thank you very much for taking care of this issue.

David




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to