Mahout has this.

We have an LSMR implementation that can accept a generic linear operator.
 You can implement this linear operator as an out of core multiplication or
as a cluster operation.

You don't say how large you want the system to be or whether you have sparse
data.  That might change the answer.

See http://www.stanford.edu/group/SOL/software/lsmr.html

On Fri, Jun 24, 2011 at 11:44 AM, Greg Sterijevski
<gsterijev...@gmail.com>wrote:

> Hello All,
>
> I have been a user of the math commons jar for a little over a year and am
> very impressed with it. I was wondering whether anyone is actively working
> on implementing functionality to do regressions on very very large data
> sets. The current implementation of the OLS routine is an in-core QR
> decomposition with substitution. While the solutions are typically
> accurate,
> the in-core nature limits the usefulness of these objects.
>
> Looking through the code, most of the implementation of an InputStream
> based
> regression routine would respect the contract implicit in the interface
> MultipleLinearRegression. However, large regression problems are important
> enough that there should be a way to:
>
> 1. Wrap a potentially large data source, perhaps as an InputStream of some
> sort.
> 2. Have a separate contract with methods like clear() ( to clear whatever
> intermediate calculations are stored), and regress() which generates
> immutable results that are not affected by further updates of the data.
>
> I would appreciate any thoughts or comments, as well suggestions about
> functionality already in math commons which might address some points I
> raised.
>
> Thank you,
>
> -Greg
>

Reply via email to