[ 
https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060797#comment-13060797
 ] 

Phil Steitz commented on MATH-607:
----------------------------------

Thanks, I forgot to mention that important point.  Initially, we took the "take 
what we are given" approach, but that proved confusing and error-prone for 
users (forcing them to add unitary columns to input data).  I think it is best 
to expect no unitary columns in the design matrix and have the user explicitly 
specify "noIntercept" to estimate a model without an intercept term.  This is 
how the MultipleLinearRegression impls now work.  (See the javadoc for 
newSampleData in AbstractMultipleLinearRegression).  In the updating impls, 
this can work the same way, allowing users to omit initial "1"s from added 
rows.  I guess this will have to be a constructor parameter to work correctly 
in the updating impls.

Another thing I forgot to mention is careful specification and validation of 
array shape constraints on input data (i.e., when things have to be rectangular 
and/or of length = previously determined nVars.  I liked the lack of a setter 
for the number of explanatory variables, but that means the first addData 
becomes definitional.

One final suggestion - maybe the row version of addData should be 
addObservation and the matrix version should be addObservations.


> Current Multiple Regression Object does calculations with all data incore. 
> There are non incore techniques which would be useful with large datasets.
> -----------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MATH-607
>                 URL: https://issues.apache.org/jira/browse/MATH-607
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.0
>         Environment: Java
>            Reporter: greg sterijevski
>              Labels: Gentleman's, QR, Regression, Updating, decomposition, 
> lemma
>             Fix For: 3.0
>
>         Attachments: updating_reg_ifaces
>
>   Original Estimate: 840h
>  Remaining Estimate: 840h
>
> The current multiple regression class does a QR decomposition on the complete 
> data set. This necessitates the loading incore of the complete dataset. For 
> large datasets, or large datasets and a requirement to do datamining or 
> stepwise regression this is not practical. There are techniques which form 
> the normal equations on the fly, as well as ones which form the QR 
> decomposition on an update basis. I am proposing, first, the specification of 
> an "UpdatingLinearRegression" interface which defines basic functionality all 
> such techniques must fulfill. 
> Related to this 'updating' regression, the results of running a regression on 
> some subset of the data should be encapsulated in an immutable object. This 
> is to ensure that subsequent additions of observations do not corrupt or 
> render inconsistent parameter estimates. I am calling this interface 
> "RegressionResults".  
> Once the community has reached a consensus on the interface, work on the 
> concrete implementation of these techniques will take place.
> Thanks,
> -Greg

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to