Hello,

I recently implemented a single pass algorithm for penalized linear
regression with cross validation in a big data start-up. I'd like to
contribute this to Mahout.

Penalized linear regression such as Lasso, Elastic-net are widely used in
machine learning, but there are no very efficient scalable implementations
on MapReduce.

The published distributed algorithms for solving this problem is either
iterative (which is not good for MapReduce, see Steven Boyd's paper) or
approximate (what if we need exact solutions, see Paralleled stochastic
gradient descent); another disadvantage of these algorithms is they can not
do cross validation in the training phase, the user must provide a penalty
parameter in advance.

My ideas can train the model with cross validation in a simple pass. They
are based on some simple observations. I will post them on Arxiv then share
the link in the follow-up email.

Any feedback would be helpful.

Thanks
-Michael

Reply via email to