[
https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060792#comment-13060792
]
greg sterijevski commented on MATH-607:
---------------------------------------
One more thing, on the subject of the adjusted R Squared. I am not sure I would
include this, since this is dependent on knowledge that a constant exists. I
currently envision being handed some data. If the data has a column which is
nothing but ones, great. If not, great again. I could not come up with an
elegant way to handle constant detection, and therefore a clean way to
determine the Busse R squared.
I guess we could keep a flag for each regressor. If the regressor has a changed
value then we would say it is not a constant. The other approach is to test the
residuals for bias-if there is no bias, then constant or not we are okay.
Though that would be messy since I do not keep the data around. Either way
makes for a bit of unpleasantness that yields very little?
> Current Multiple Regression Object does calculations with all data incore.
> There are non incore techniques which would be useful with large datasets.
> -----------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: MATH-607
> URL: https://issues.apache.org/jira/browse/MATH-607
> Project: Commons Math
> Issue Type: New Feature
> Affects Versions: 3.0
> Environment: Java
> Reporter: greg sterijevski
> Labels: Gentleman's, QR, Regression, Updating, decomposition,
> lemma
> Fix For: 3.0
>
> Attachments: updating_reg_ifaces
>
> Original Estimate: 840h
> Remaining Estimate: 840h
>
> The current multiple regression class does a QR decomposition on the complete
> data set. This necessitates the loading incore of the complete dataset. For
> large datasets, or large datasets and a requirement to do datamining or
> stepwise regression this is not practical. There are techniques which form
> the normal equations on the fly, as well as ones which form the QR
> decomposition on an update basis. I am proposing, first, the specification of
> an "UpdatingLinearRegression" interface which defines basic functionality all
> such techniques must fulfill.
> Related to this 'updating' regression, the results of running a regression on
> some subset of the data should be encapsulated in an immutable object. This
> is to ensure that subsequent additions of observations do not corrupt or
> render inconsistent parameter estimates. I am calling this interface
> "RegressionResults".
> Once the community has reached a consensus on the interface, work on the
> concrete implementation of these techniques will take place.
> Thanks,
> -Greg
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira