Hi guys, First of all, we would like to thank all the Spark community for building such great platform for big data processing. We built the multinomial logistic regression with LBFGS optimizer in Spark, and LBFGS is a limited memory version of quasi-newton method which allows us to train a very high-dimensional data without computing the Hessian matrix as newton method required.
In Strata Conference, we did a great demo using Spark with our MLOR to train mnist8m dataset. We're able to train the model in 5 mins with 50 iterations, and get 86% accuracy. The first iteration takes 19.8s, and the remaining iterations take about 5~7s. We did comparison between LBFGS and SGD, and often we saw 10x less steps in LBFGS while the cost of per step is the same (just computing the gradient). The following is the paper by Prof. Ng at Stanford comparing different optimizers including LBFGS and SGD. They use them in the context of deep learning, but worth as reference. http://cs.stanford.edu/~jngiam/papers/LeNgiamCoatesLahiriProchnowNg2011.pdf We would like to break our MLOR with LBFGS into three patches to contribute to the community. 1) LBFGS optimizer - which can be used in logistic regression, and liner regression or replacing any algorithms using SGD. The core underneath LBFGS Java implementation we used is from RISO project, and the author, Robert is so kind to relicense it to GPL and Apache2 dual license. We're almost ready to submit a PR for LBFGS, see our github fork, https://github.com/AlpineNow/incubator-spark/commits/dbtsai-LBFGS However, we don't use Updater in LBFGS since it designs for GD, and for LBFGS, we don't need stepSize, and adaptive learning rate, etc. While it seems to be difficult to fit the LBFGS updater logic (well, in lbfgs library, the new weights is returned given old weights, loss, and gradient) into the current framework, I was thinking to abstract out the code computing the gradient and loss terms of regularization into different place so that different optimizer can also use it. Any suggestion about this? 2) and 3), we will add the MLOR gradient to MLLib, and add a few examples. Finally, we will have some tweak using mapPartition instead of map to further improve the performance. Thanks. Sincerely, DB Tsai Machine Learning Engineer Alpine Data Labs -------------------------------------- Web: http://alpinenow.com/