[
https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069298#comment-13069298
]
Phil Steitz edited comment on MATH-607 at 7/21/11 11:56 PM:
------------------------------------------------------------
Don't worry about the exceptions. The only thing remaining there is to fit
into the hierarchy. I will fix that. Feel free to weigh in on the ML thread,
though.
I am still not sold on the globalstats array. It is just not the Java way to
use arrays with static constants into them to represent properties. I agree
strongly that we are going to want to add more fields to RegressionResults. In
the public API, we are going to want them to be properties, though. Why would
a user ever want to use getGlobalStats[THING_I_WANT] instead of getThingIWant?
It is actually more code to maintain the enum plus the array rather than just
fields. So I guess I disagree with a) and d) above. I get b), but don't see
it as a big deal or enough to change API. I also get c), but it frankly looks
a little scary. I have been burned so many times over the years by indicies
into blocks of storage, variable content hashmaps of attributes, etc. that I
try to avoid these things in my code. And think about the change in c) in any
case - from globalStats[OLD_INDEX] to globalStats[NEW_INDEX] somehow through
the API, when the same change is really s/oldProperty/newProperty likely at the
same entry point.
If we think that the globalFit stuff needs to be encapsulated within
RegressionResults, I would be fine with defining another class to hold global
model fit statistics. This class would then have the fit properties as fields.
was (Author: psteitz):
Don't worry about the exceptions. The only thing remaining there is to fit
into the hierarchy. I will fix that. Feel free to weigh in on the ML thread,
though.
I am still not sold on the globalstats array. It is just not the Java way to
use arrays with static constants into them to represent properties. I agree
strongly that we are going to want to add more fields to RegressionResults. In
the public API, we are going to want them to be properties, though. Why would
a user ever want to use getGlobalStats[THING_I_WANT] instead of getThingIWant?
It is actually more code to maintain the enum plus the array rather than just
fields. So I guess I disagree with a) and d) above. I get b), but don't see
it as a big deal or enough to change API. I also get c), but it frankly looks
a little scary. I have been burned so many times over the years by indicies
into blocks of storage, variable content hashmaps of attributes, etc. that I
try to avoid these things in my code. And think about the change in c) in any
case - from globalStats[OLD_INDEX] to globalStats[NEW_INDEX] somehow through
the API, when the same change is really s/oldProperty/newProperty likely at the
same entry point.
> Current Multiple Regression Object does calculations with all data incore.
> There are non incore techniques which would be useful with large datasets.
> -----------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: MATH-607
> URL: https://issues.apache.org/jira/browse/MATH-607
> Project: Commons Math
> Issue Type: New Feature
> Affects Versions: 3.0
> Environment: Java
> Reporter: greg sterijevski
> Labels: Gentleman's, QR, Regression, Updating, decomposition,
> lemma
> Fix For: 3.0
>
> Attachments: RegressResults2, millerreg, millerreg_take2,
> millerregtest, regres_change1, updating_reg_cut2, updating_reg_ifaces
>
> Original Estimate: 840h
> Remaining Estimate: 840h
>
> The current multiple regression class does a QR decomposition on the complete
> data set. This necessitates the loading incore of the complete dataset. For
> large datasets, or large datasets and a requirement to do datamining or
> stepwise regression this is not practical. There are techniques which form
> the normal equations on the fly, as well as ones which form the QR
> decomposition on an update basis. I am proposing, first, the specification of
> an "UpdatingLinearRegression" interface which defines basic functionality all
> such techniques must fulfill.
> Related to this 'updating' regression, the results of running a regression on
> some subset of the data should be encapsulated in an immutable object. This
> is to ensure that subsequent additions of observations do not corrupt or
> render inconsistent parameter estimates. I am calling this interface
> "RegressionResults".
> Once the community has reached a consensus on the interface, work on the
> concrete implementation of these techniques will take place.
> Thanks,
> -Greg
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira