[ 
https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13062832#comment-13062832
 ] 

Phil Steitz commented on MATH-607:
----------------------------------

Thanks, Greg!

I committed the patch with minor modifications to make it (almost) consistent 
with [math] style guidelines.  (Running "mvn site" and looking at the 
checkstyle report shows where problems are with patches).  I didn't make any 
really substantive changes, but there is still some work to be done.  I wanted 
to get the classes committed, though, so we could start the implementation work 
and refine them as we go.

Here is what still needs attention on the interface/value classes:

1) There is some missing javadoc
2) I made the static constants for the overall stats private in 
RegressionResults.  I did not see any use for them outside of the class and in 
fact I think it would likely be better to replace the internal array 
representation of those data by an inner class with proper field names or just 
define separate data members. Maybe you see that array has having variable 
length for some models?  I am OK leaving as is for now, but lets keep it all 
private.
3) We can wait to fix this until we know more exactly what is going to come out 
of the implementations, but we need to fit the exceptions into the [math] 
hierarchy and be explicit in the throws clauses.
4) There are a couple of references in the javadoc for "reduncancy flags" but 
these are not actually available in the RegressionResults.  Probably the 
references should be dropped and subclasses that expose these will be added for 
models that include them.
5) The preconditions statements are good to retain, but I don't think they 
actually belong in the RegressionResults javadoc.  Most likely they should be 
in the javadoc for either UpdatingRegression#regress or the implementations.

Thanks for the patch!

> Current Multiple Regression Object does calculations with all data incore. 
> There are non incore techniques which would be useful with large datasets.
> -----------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MATH-607
>                 URL: https://issues.apache.org/jira/browse/MATH-607
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.0
>         Environment: Java
>            Reporter: greg sterijevski
>              Labels: Gentleman's, QR, Regression, Updating, decomposition, 
> lemma
>             Fix For: 3.0
>
>         Attachments: updating_reg_cut2, updating_reg_ifaces
>
>   Original Estimate: 840h
>  Remaining Estimate: 840h
>
> The current multiple regression class does a QR decomposition on the complete 
> data set. This necessitates the loading incore of the complete dataset. For 
> large datasets, or large datasets and a requirement to do datamining or 
> stepwise regression this is not practical. There are techniques which form 
> the normal equations on the fly, as well as ones which form the QR 
> decomposition on an update basis. I am proposing, first, the specification of 
> an "UpdatingLinearRegression" interface which defines basic functionality all 
> such techniques must fulfill. 
> Related to this 'updating' regression, the results of running a regression on 
> some subset of the data should be encapsulated in an immutable object. This 
> is to ensure that subsequent additions of observations do not corrupt or 
> render inconsistent parameter estimates. I am calling this interface 
> "RegressionResults".  
> Once the community has reached a consensus on the interface, work on the 
> concrete implementation of these techniques will take place.
> Thanks,
> -Greg

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to