[
https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060791#comment-13060791
]
Phil Steitz commented on MATH-607:
----------------------------------
I get your point on the Results interface. It did not look "large" to me at
first (i.e., generally o(vars) vs o(obs)). If it could get "large" it would
indeed be better to leave as an interface. The problem there is really nailing
it because interfaces are very hard to change. My sense at this point is that
we may want to rev this a few times before it is really stable, so a concrete
class would be better to start with. Also, having the "value" class is handy.
StatisticalSummaryValues is an example of that (which implements the interface
that preceded it - so maybe having both is a good longer term solution). If it
turns out to be too unwieldy to create the results factory methods, I am OK
starting with the interface approach, but in that case we should review it very
carefully prior to release.
I did not mean to suggest that UpdatingOLSRegression should be an abstract
class. If and when a weighted or non-OLS updating regression is implemented,
we might consider introducing an abstract parent, but I would need to see good
reason for this. IMO, what we have now in OLS, WLS is of marginal value (I
mean the abstract superclass and interface).
> Current Multiple Regression Object does calculations with all data incore.
> There are non incore techniques which would be useful with large datasets.
> -----------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: MATH-607
> URL: https://issues.apache.org/jira/browse/MATH-607
> Project: Commons Math
> Issue Type: New Feature
> Affects Versions: 3.0
> Environment: Java
> Reporter: greg sterijevski
> Labels: Gentleman's, QR, Regression, Updating, decomposition,
> lemma
> Fix For: 3.0
>
> Attachments: updating_reg_ifaces
>
> Original Estimate: 840h
> Remaining Estimate: 840h
>
> The current multiple regression class does a QR decomposition on the complete
> data set. This necessitates the loading incore of the complete dataset. For
> large datasets, or large datasets and a requirement to do datamining or
> stepwise regression this is not practical. There are techniques which form
> the normal equations on the fly, as well as ones which form the QR
> decomposition on an update basis. I am proposing, first, the specification of
> an "UpdatingLinearRegression" interface which defines basic functionality all
> such techniques must fulfill.
> Related to this 'updating' regression, the results of running a regression on
> some subset of the data should be encapsulated in an immutable object. This
> is to ensure that subsequent additions of observations do not corrupt or
> render inconsistent parameter estimates. I am calling this interface
> "RegressionResults".
> Once the community has reached a consensus on the interface, work on the
> concrete implementation of these techniques will take place.
> Thanks,
> -Greg
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira