[ https://issues.apache.org/jira/browse/MATH-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16835942#comment-16835942 ]
Gilles commented on MATH-1482: ------------------------------ Thanks for your proposal. It just happens that this part of the code is going to be refactored and ported to a new ["Commons Statistics"|http://commons.apache.org/proper/commons-statistics/] component. This a GSoC project being discussed right now on the "dev" mailing list. And your use-case is certainly welcome in order to shape the new design. You can make sure that it will be taken into account by subscribing to the ML and start a discussion over there. > Pull request for GLSMultipleLinearRegression > -------------------------------------------- > > Key: MATH-1482 > URL: https://issues.apache.org/jira/browse/MATH-1482 > Project: Commons Math > Issue Type: Improvement > Reporter: Elena Kartysheva > Priority: Trivial > > I would like to propose a pull request implementing an option to use variance > vector instead of covariance matrix. It allows users to avoid unnecessary > memory usage and excessive computation in case of uncorrelated but > heteroscedastic errors thus making it possible to work with huge input > matrices. Using variance vector in such cases allows to reduce time > complexity from O(n^2) to just O(n) (where n is a number of observations) and > dramatically reduce memory usage. For example, in my practice arose a need to > train generalized linear model. Usage of Iteratively reweighted least squares > algorithm requires weighted regression with more than a million observations. > Current implementation would require approximately 12 terabytes of memory > while patched version needs only 8 megabytes. Since IRLS is iterative > algorithm a million-times complexity reduction is also pretty handy. > https://github.com/apache/commons-math/pull/106 -- This message was sent by Atlassian JIRA (v7.6.3#76005)