[ 
https://issues.apache.org/jira/browse/MATH-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088421#comment-13088421
 ] 

Phil Steitz commented on MATH-449:
----------------------------------

Code has been committed in r1160026 with the following additional changes 
(beyond refactoring above):

1. Eliminated use of deprecated MathRuntimeException (I know, this usage was 
copied from Covariance, which also needs to be fixed.  I will do that.)
2. Changed N to n as field name and other minor formatting
3. Changed error message to INSUFFICIENT_OBSERVED_POINTS_IN_SAMPLE from 
INSUFFICIENT_DIMENSIONS for insufficient data in the bivariate case.
4. Eliminated the try-catch blocks in getData, getCovarianceMatrix in 
StorelessCovariance (renamed).  These both advertise and throw 
IllegalArgumentException and I saw no reason to wrap what was being caught and 
rethrown.
5. Renamed getEntry, setEntry, incrementEntry to xXCovariance in 
StorelessCovariance.

Some more notes: 

In making StorelessCovariance extend Covariance, I had to override getN to 
throw UnsupportedOperationException, since there is no global N defined in the 
storeless implementation. I guess alternatively, we could return the min among 
the bivariate covariances.  More on this below.

I did not improve or merge the tests, which we should also eventually do - 
i.e., make StorlessCovarianceTest extend CovarianceTest, refactoring the base 
class tests so they can be applied to the subclass.  To do this, we need to 
feed the data incrementally to the storeless version, separating the data 
provisioning from validation in the tests.

Sorry I did not notice this before, but we need to do something about the 
potential lack of integrity of the (virtual) covariance matrix exposed.  
Currently, incrementing (i,j) does nothing to (j,i) so the virtual matrix is 
not even guaranteed to be symmetric.  I thought about dispatching calls to 
ensure this, but when you think about all of the constraints involved in 
ensuring what we expose is a valid covariance matrix, I am leaning toward 
recommending instead that we do not allow the granular, pairwise incrementing 
or individual bivariate covariance setters; but instead keep only the increment 
method that requires a full vector of new values.  This will guarantee data 
integrity (and also as a bonus, restore meaningfulness of getN).  What do you 
think about this?  If we don't do it, we need to at least ensure minimally that 
the matrix is symmetric and we should probably remove the setEntry method.  My 
recommendation is that we 

a) Drop setEntry and incrementCovariance.  Rename incrementRow to increment and 
have that the only mutator.
b) Replace colDimension and rowDimension with just dimension, forcing the 
matrix to be square.
c) Store only upper triangular BivariateCovariances.  Add a transpose method to 
StorelessBivariateCovariance so getEntry returns something that can be further 
incemented properly.
d) Add symmetry tests




> Storeless covariance
> --------------------
>
>                 Key: MATH-449
>                 URL: https://issues.apache.org/jira/browse/MATH-449
>             Project: Commons Math
>          Issue Type: Improvement
>            Reporter: Patrick Meyer
>            Assignee: Phil Steitz
>             Fix For: 3.1
>
>         Attachments: MATH-449.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently there is no storeless version for computing the covariance. 
> However, Pebay (2008) describes algorithms for on-line covariance 
> computations, [http://infoserve.sandia.gov/sand_doc/2008/086212.pdf]. I have 
> provided a simple class for implementing this algorithm. It would be nice to 
> have this integrated into org.apache.commons.math.stat.correlation.Covariance.
> {code}
> //This code is granted for inclusion in the Apache Commons under the terms of 
> the ASL.
> public class StorelessCovariance{
>     private double deltaX = 0.0;
>     private double deltaY = 0.0;
>     private double meanX = 0.0;
>     private double meanY = 0.0;
>     private double N=0;
>     private Double covarianceNumerator=0.0;
>     private boolean unbiased=true;
>     public Covariance(boolean unbiased){
>       this.unbiased = unbiased;
>     }
>     public void increment(Double x, Double y){
>         if(x!=null & y!=null){
>             N++;
>             deltaX = x - meanX;
>             deltaY = y - meanY;
>             meanX += deltaX/N;
>             meanY += deltaY/N;
>             covarianceNumerator += ((N-1.0)/N)*deltaX*deltaY;
>         }
>         
>     }
>     public Double getResult(){
>         if(unbiased){
>             return covarianceNumerator/(N-1.0);
>         }else{
>             return covarianceNumerator/N;
>         }
>     }   
> }
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to