[jira] [Created] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.

greg sterijevski (JIRA) Wed, 29 Jun 2011 21:58:08 -0700

Current Multiple Regression Object does calculations with all data incore. 
There are non incore techniques which would be useful with large datasets.
-----------------------------------------------------------------------------------------------------------------------------------------------------


                 Key: MATH-607
                 URL: https://issues.apache.org/jira/browse/MATH-607
             Project: Commons Math
          Issue Type: New Feature
    Affects Versions: 3.0
         Environment: Java
            Reporter: greg sterijevski
             Fix For: 3.0


The current multiple regression class does a QR decomposition on the complete 
data set. This necessitates the loading incore of the complete dataset. For 
large datasets, or large datasets and a requirement to do datamining or 
stepwise regression this is not practical. There are techniques which form the 
normal equations on the fly, as well as ones which form the QR decomposition on 
an update basis. I am proposing, first, the specification of an 
"UpdatingLinearRegression" interface which defines basic functionality all such 
techniques must fulfill. 

Related to this 'updating' regression, the results of running a regression on 
some subset of the data should be encapsulated in an immutable object. This is 
to ensure that subsequent additions of observations do not corrupt or render 
inconsistent parameter estimates. I am calling this interface 
"RegressionResults".  

Once the community has reached a consensus on the interface, work on the 
concrete implementation of these techniques will take place.

Thanks,

-Greg

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.

Reply via email to