[ 
https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060792#comment-13060792
 ] 

greg sterijevski commented on MATH-607:
---------------------------------------

One more thing, on the subject of the adjusted R Squared. I am not sure I would 
include this, since this is dependent on knowledge that a constant exists. I 
currently envision being handed some data. If the data has a column which is 
nothing but ones, great. If not, great again. I could not come up with an 
elegant way to handle constant detection, and therefore a clean way to 
determine the Busse R squared. 

I guess we could keep a flag for each regressor. If the regressor has a changed 
value then we would say it is not a constant. The other approach is to test the 
residuals for bias-if there is no bias, then constant or not we are okay. 
Though that would be messy since I do not keep the data around. Either way 
makes for a bit of unpleasantness that yields very little? 

> Current Multiple Regression Object does calculations with all data incore. 
> There are non incore techniques which would be useful with large datasets.
> -----------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MATH-607
>                 URL: https://issues.apache.org/jira/browse/MATH-607
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.0
>         Environment: Java
>            Reporter: greg sterijevski
>              Labels: Gentleman's, QR, Regression, Updating, decomposition, 
> lemma
>             Fix For: 3.0
>
>         Attachments: updating_reg_ifaces
>
>   Original Estimate: 840h
>  Remaining Estimate: 840h
>
> The current multiple regression class does a QR decomposition on the complete 
> data set. This necessitates the loading incore of the complete dataset. For 
> large datasets, or large datasets and a requirement to do datamining or 
> stepwise regression this is not practical. There are techniques which form 
> the normal equations on the fly, as well as ones which form the QR 
> decomposition on an update basis. I am proposing, first, the specification of 
> an "UpdatingLinearRegression" interface which defines basic functionality all 
> such techniques must fulfill. 
> Related to this 'updating' regression, the results of running a regression on 
> some subset of the data should be encapsulated in an immutable object. This 
> is to ensure that subsequent additions of observations do not corrupt or 
> render inconsistent parameter estimates. I am calling this interface 
> "RegressionResults".  
> Once the community has reached a consensus on the interface, work on the 
> concrete implementation of these techniques will take place.
> Thanks,
> -Greg

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to