Daniel Daniel created MADLIB-1460:
-------------------------------------

             Summary: Prevent an "integer out of range" exception in linear 
regression train
                 Key: MADLIB-1460
                 URL: https://issues.apache.org/jira/browse/MADLIB-1460
             Project: Apache MADlib
          Issue Type: Bug
          Components: Module: Linear Regression
            Reporter: Daniel Daniel


Linear regression training results in 2 output tables (*neither are optional*): 
 * The *primary* output table, that includes the computed coefficients.
 * A *summary* output table, that contains a single line.

+Scenario+

Running the linear regression training in postgresql on an input table which 
has *more than 2^31 records* within it (even if a grouping column is 
specified), fails due to an "*integer out of range*" exception.

+Source+

*The summary table* has a column that stores *the total number of records* 
involved in the computation. The column's data type is a *singed integer*. 
However, the total number of records is computed as a *BIGINT*. Therefore, when 
the total number of records in the input table is beyond the range of a signed 
integer (i.e., 2^31), an "integer out of range" exception is thrown.

+Solution+

A simple solution is to change the data type of the column from a *signed 
integer* into a *BIGINT*. 

+Test+

We have executed the linear regression training function with and without the 
suggested modification on an input table having between 2^31-2^32 records. 
Without the modification, an integer out of range exception was thrown. After 
modifying the code as suggested, it worked perfectly. 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to