[ https://issues.apache.org/jira/browse/MADLIB-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266483#comment-17266483 ]
Frank McQuillan commented on MADLIB-1460: ----------------------------------------- (1) BIGINT indep and dep vars in the BIGINT space Train {code} DROP TABLE IF EXISTS tab1; CREATE TABLE tab1( indep_var BIGINT, dep_var BIGINT ); INSERT INTO tab1 VALUES(100000000000, 100000000000); INSERT INTO tab1 VALUES(200000000000, 200000000000); INSERT INTO tab1 VALUES(300000000000, 300000000000); INSERT INTO tab1 VALUES(400000000000, 400000000000); INSERT INTO tab1 VALUES(500000000000, 500000000000); DROP TABLE IF EXISTS test_linregr, test_linregr_summary; SELECT madlib.linregr_train( 'tab1', 'test_linregr', 'dep_var', 'ARRAY[1, indep_var]' ); {code} {code} madlib=# select * from test_linregr_summary; -[ RECORD 1 ]------------+-------------------- method | linregr source_table | tab1 out_table | test_linregr dependent_varname | dep_var independent_varname | ARRAY[1, indep_var] num_rows_processed | 5 num_missing_rows_skipped | 0 grouping_col | {code} {code} madlib=# select * from test_linregr; -[ RECORD 1 ]------------+------------------------- coef | {2.72727272727273e-12,1} r2 | 1 std_err | {0,0} t_stats | {Infinity,Infinity} p_values | {NaN,NaN} condition_no | 777817459305.202 num_rows_processed | 5 num_missing_rows_skipped | 0 variance_covariance | {{0,0},{0,0}} {code} Predict {code} madlib=# SELECT madlib.linregr_predict( m.coef, ARRAY[1,indep_var]) as predict FROM tab1, test_linregr m; predict -------------- 300000000000 500000000000 100000000000 200000000000 400000000000 (5 rows) {code} > Prevent an "integer out of range" exception in linear regression train > ---------------------------------------------------------------------- > > Key: MADLIB-1460 > URL: https://issues.apache.org/jira/browse/MADLIB-1460 > Project: Apache MADlib > Issue Type: Improvement > Components: Module: Linear Regression > Reporter: Daniel Daniel > Priority: Minor > Fix For: v1.18.0 > > > Linear regression training results in 2 output tables (*neither are > optional*): > * The *primary* output table, that includes the computed coefficients. > * A *summary* output table, that contains a single line. > +Scenario+ > Running the linear regression training in postgresql on an input table which > has *more than 2^31 records* within it (even if a grouping column is > specified), fails due to an "*integer out of range*" exception. > +Source+ > *The summary table* has a column that stores *the total number of records* > involved in the computation. The column's data type is a *singed integer*. > However, the total number of records is computed as a *BIGINT*. Therefore, > when the total number of records in the input table is beyond the range of a > signed integer (i.e., 2^31), an "integer out of range" exception is thrown. > +Solution+ > A simple solution is to change the data type of the column from a *signed > integer* into a *BIGINT*. > +Test+ > We have executed the linear regression training function with and without the > suggested modification on an input table having between 2^31-2^32 records. > Without the modification, an integer out of range exception was thrown. After > modifying the code as suggested, it worked perfectly. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)