Hi guys,

First of all, we would like to thank all the Spark community for
building such great platform for big data processing. We built the
multinomial logistic regression with LBFGS optimizer in Spark, and
LBFGS is a limited memory version of quasi-newton method which allows
us to train a very high-dimensional data without computing the Hessian
matrix as newton method required.

In Strata Conference, we did a great demo using Spark with our MLOR to
train mnist8m dataset. We're able to train the model in 5 mins with 50
iterations, and get 86% accuracy. The first iteration takes 19.8s, and
the remaining iterations take about 5~7s.

We did comparison between LBFGS and SGD, and often we saw 10x less
steps in LBFGS while the cost of per step is the same (just computing
the gradient).

The following is the paper by Prof. Ng at Stanford comparing different
optimizers including LBFGS and SGD. They use them in the context of
deep learning, but worth as reference.
http://cs.stanford.edu/~jngiam/papers/LeNgiamCoatesLahiriProchnowNg2011.pdf

We would like to break our MLOR with LBFGS into three patches to
contribute to the community.

1) LBFGS optimizer - which can be used in logistic regression, and
liner regression or replacing any algorithms using SGD.
The core underneath LBFGS Java implementation we used is from RISO
project, and the author, Robert is so kind to relicense it to GPL and
Apache2 dual license.

We're almost ready to submit a PR for LBFGS, see our github fork,
https://github.com/AlpineNow/incubator-spark/commits/dbtsai-LBFGS

However, we don't use Updater in LBFGS since it designs for GD, and
for LBFGS, we don't need stepSize, and adaptive learning rate, etc.
While it seems to be difficult to fit the LBFGS updater logic (well,
in lbfgs library, the new weights is returned given old weights, loss,
and gradient) into the current framework, I was thinking to abstract
out the code computing the gradient and loss terms of regularization
into different place so that different optimizer can also use it. Any
suggestion about this?

2) and 3), we will add the MLOR gradient to MLLib, and add a few
examples. Finally, we will have some tweak using mapPartition instead
of map to further improve the performance.

Thanks.

Sincerely,

DB Tsai
Machine Learning Engineer
Alpine Data Labs
--------------------------------------
Web: http://alpinenow.com/

Reply via email to