Hi!

I'm currently working on a rather large-scale dataset (~300M samples
represented as dense vectors of cardinality ~100).
The data lives in an EC2 Hadoop cluster and pre-processed using MR jobs,
including heavy usage of Mahout (Lanczos decomposition, clustering, etc).

I'm now looking for ways to learn a logistic regression model based on the
data.
So far I postponed this part of the project, hoping for
MAHOUT-228<https://issues.apache.org/jira/browse/MAHOUT-228>to be
ready... but unfortunately I can't afford to wait any more :)

Looking around, I've found Google's
sofia-ml<http://code.google.com/p/sofia-ml/>and some UC Berkeley
Hadoop-based
implementation<http://berkeley-mltea.pbworks.com/Hadoop-for-Machine-Learning-Guide>
.
Anyone has experience with these, or knows of / used a good library for
logistic regressions of this scale?

Thanks,
Danny

Reply via email to