Re: MLLib - Thoughts about refactoring Updater for LBFGS?

DB Tsai Tue, 08 Apr 2014 15:45:06 -0700

Hi Debasish,

The L-BFGS solver will be in the master like GD solver, and the part
that is parallelized is computing the gradient of each input row, and
summing them up.


I prefer to make the optimizer plug-able instead of adding new
LogisticRegressionWithLBFGS since 98% of the code will be the same.

Nice to have something like this,

class LogisticRegression private (
    var optimizer: Optimizer)
  extends GeneralizedLinearAlgorithm[LogisticRegressionModel]

The following parameters will be setup in the optimizers, and they
should because they are part of optimization parameters.

    var stepSize: Double,
    var numIterations: Int,
    var regParam: Double,
    var miniBatchFraction: Double

Xiangrui, what do you think?

For now, you can use my L-BFGS solver by copying and pasting the
LogisticRegressionWithSGD code, and changing the optimizer to L-BFGS.

Sincerely,

DB Tsai
-------------------------------------------------------
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai


On Tue, Apr 8, 2014 at 9:42 AM, Debasish Das <debasish.da...@gmail.com> wrote:
> Hi DB,
>
> Are we going to clean up the function:
>
> class LogisticRegressionWithSGD private (
>     var stepSize: Double,
>     var numIterations: Int,
>     var regParam: Double,
>     var miniBatchFraction: Double)
>   extends GeneralizedLinearAlgorithm[LogisticRegressionModel] with
> Serializable {
>
>   val gradient = new LogisticGradient()
>   val updater = new SimpleUpdater()
>   override val optimizer = new GradientDescent(gradient, updater)
>
> Or add a new one ?
>
> class LogisticRegressionWithBFGS ?
>
> The WithABC is optional since optimizer could be picked up either based on a
> flag...there are only 3 options for optimizor:
>
> 1. GradientDescent
> 2. Quasi Newton
> 3. Newton
>
> May be we add an enum for optimization type....and then under
> GradientDescent family people can add their variants of SGD....Not sure if
> ConjugateGradient comes under 1 or 2....may be we need 4 options...
>
> Thanks.
> Deb
>
>
> On Mon, Apr 7, 2014 at 11:23 PM, Debasish Das <debasish.da...@gmail.com>
> wrote:
>>
>> I got your checkin....I need to run logistic regression SGD vs BFGS for my
>> current usecases but your next checkin will update the logistic regression
>> with LBFGS right ? Are you adding it to regression package as well ?
>>
>> Thanks.
>> Deb
>>
>>
>> On Mon, Apr 7, 2014 at 7:00 PM, DB Tsai <dbt...@stanford.edu> wrote:
>>>
>>> Hi guys,
>>>
>>> The latest PR uses Breeze's L-BFGS implement which is introduced by
>>> Xiangrui's sparse input format work in SPARK-1212.
>>>
>>> https://github.com/apache/spark/pull/353
>>>
>>> Now, it works with the new sparse framework!
>>>
>>> Any feedback would be greatly appreciated.
>>>
>>> Thanks.
>>>
>>> Sincerely,
>>>
>>> DB Tsai
>>> -------------------------------------------------------
>>> My Blog: https://www.dbtsai.com
>>> LinkedIn: https://www.linkedin.com/in/dbtsai
>>>
>>>
>>> On Thu, Apr 3, 2014 at 5:02 PM, DB Tsai <dbt...@alpinenow.com> wrote:
>>> > ---------- Forwarded message ----------
>>> > From: David Hall <d...@cs.berkeley.edu>
>>> > Date: Sat, Mar 15, 2014 at 10:02 AM
>>> > Subject: Re: MLLib - Thoughts about refactoring Updater for LBFGS?
>>> > To: DB Tsai <dbt...@alpinenow.com>
>>> >
>>> >
>>> > On Fri, Mar 7, 2014 at 10:56 PM, DB Tsai <dbt...@alpinenow.com> wrote:
>>> >>
>>> >> Hi David,
>>> >>
>>> >> Please let me know the version of Breeze that LBFGS can be serialized,
>>> >> and CachedDiffFunction is built-in in LBFGS once you finish. I'll
>>> >> update the PR to Spark from using RISO implementation to Breeze
>>> >> implementation.
>>> >
>>> >
>>> > The current master (0.7-SNAPSHOT) has these problems fixed.
>>> >
>>> >>
>>> >>
>>> >> Thanks.
>>> >>
>>> >> Sincerely,
>>> >>
>>> >> DB Tsai
>>> >> Machine Learning Engineer
>>> >> Alpine Data Labs
>>> >> --------------------------------------
>>> >> Web: http://alpinenow.com/
>>> >>
>>> >>
>>> >> On Thu, Mar 6, 2014 at 4:26 PM, David Hall <d...@cs.berkeley.edu>
>>> >> wrote:
>>> >> > On Thu, Mar 6, 2014 at 4:21 PM, DB Tsai <dbt...@alpinenow.com>
>>> >> > wrote:
>>> >> >
>>> >> >> Hi David,
>>> >> >>
>>> >> >> I can converge to the same result with your breeze LBFGS and
>>> >> >> Fortran
>>> >> >> implementations now. Probably, I made some mistakes when I tried
>>> >> >> breeze before. I apologize that I claimed it's not stable.
>>> >> >>
>>> >> >> See the test case in BreezeLBFGSSuite.scala
>>> >> >> https://github.com/AlpineNow/spark/tree/dbtsai-breezeLBFGS
>>> >> >>
>>> >> >> This is training multinomial logistic regression against iris
>>> >> >> dataset,
>>> >> >> and both optimizers can train the models with 98% training
>>> >> >> accuracy.
>>> >> >>
>>> >> >
>>> >> > great to hear! There were some bugs in LBFGS about 6 months ago, so
>>> >> > depending on the last time you tried it, it might indeed have been
>>> >> > bugged.
>>> >> >
>>> >> >
>>> >> >>
>>> >> >> There are two issues to use Breeze in Spark,
>>> >> >>
>>> >> >> 1) When the gradientSum and lossSum are computed distributively in
>>> >> >> custom defined DiffFunction which will be passed into your
>>> >> >> optimizer,
>>> >> >> Spark will complain LBFGS class is not serializable. In
>>> >> >> BreezeLBFGS.scala, I've to convert RDD to array to make it work
>>> >> >> locally. It should be easy to fix by just having LBFGS to implement
>>> >> >> Serializable.
>>> >> >>
>>> >> >
>>> >> > I'm not sure why Spark should be serializing LBFGS? Shouldn't it
>>> >> > live on
>>> >> > the controller node? Or is this a per-node thing?
>>> >> >
>>> >> > But no problem to make it serializable.
>>> >> >
>>> >> >
>>> >> >>
>>> >> >> 2) Breeze computes redundant gradient and loss. See the following
>>> >> >> log
>>> >> >> from both Fortran and Breeze implementations.
>>> >> >>
>>> >> >
>>> >> > Err, yeah. I should probably have LBFGS do this automatically, but
>>> >> > there's
>>> >> > a CachedDiffFunction that gets rid of the redundant calculations.
>>> >> >
>>> >> > -- David
>>> >> >
>>> >> >
>>> >> >>
>>> >> >> Thanks.
>>> >> >>
>>> >> >> Fortran:
>>> >> >> Iteration -1: loss 1.3862943611198926, diff 1.0
>>> >> >> Iteration 0: loss 1.5846343143210866, diff 0.14307193024217352
>>> >> >> Iteration 1: loss 1.1242501524477688, diff 0.29053004039012126
>>> >> >> Iteration 2: loss 1.0930151243303563, diff 0.027782962952189336
>>> >> >> Iteration 3: loss 1.054036932835569, diff 0.03566113127440601
>>> >> >> Iteration 4: loss 0.9907956302751622, diff 0.05999907649459571
>>> >> >> Iteration 5: loss 0.9184205380342829, diff 0.07304737423337761
>>> >> >> Iteration 6: loss 0.8259870936519937, diff 0.10064381175132982
>>> >> >> Iteration 7: loss 0.6327447552109574, diff 0.23395293458364716
>>> >> >> Iteration 8: loss 0.5534101162436359, diff 0.1253815427665277
>>> >> >> Iteration 9: loss 0.4045020086612566, diff 0.26907321376758075
>>> >> >> Iteration 10: loss 0.3078824990823728, diff 0.23885980452569627
>>> >> >>
>>> >> >> Breeze:
>>> >> >> Iteration -1: loss 1.3862943611198926, diff 1.0
>>> >> >> Mar 6, 2014 3:59:11 PM com.github.fommil.netlib.BLAS <clinit>
>>> >> >> WARNING: Failed to load implementation from:
>>> >> >> com.github.fommil.netlib.NativeSystemBLAS
>>> >> >> Mar 6, 2014 3:59:11 PM com.github.fommil.netlib.BLAS <clinit>
>>> >> >> WARNING: Failed to load implementation from:
>>> >> >> com.github.fommil.netlib.NativeRefBLAS
>>> >> >> Iteration 0: loss 1.3862943611198926, diff 0.0
>>> >> >> Iteration 1: loss 1.5846343143210866, diff 0.14307193024217352
>>> >> >> Iteration 2: loss 1.1242501524477688, diff 0.29053004039012126
>>> >> >> Iteration 3: loss 1.1242501524477688, diff 0.0
>>> >> >> Iteration 4: loss 1.1242501524477688, diff 0.0
>>> >> >> Iteration 5: loss 1.0930151243303563, diff 0.027782962952189336
>>> >> >> Iteration 6: loss 1.0930151243303563, diff 0.0
>>> >> >> Iteration 7: loss 1.0930151243303563, diff 0.0
>>> >> >> Iteration 8: loss 1.054036932835569, diff 0.03566113127440601
>>> >> >> Iteration 9: loss 1.054036932835569, diff 0.0
>>> >> >> Iteration 10: loss 1.054036932835569, diff 0.0
>>> >> >> Iteration 11: loss 0.9907956302751622, diff 0.05999907649459571
>>> >> >> Iteration 12: loss 0.9907956302751622, diff 0.0
>>> >> >> Iteration 13: loss 0.9907956302751622, diff 0.0
>>> >> >> Iteration 14: loss 0.9184205380342829, diff 0.07304737423337761
>>> >> >> Iteration 15: loss 0.9184205380342829, diff 0.0
>>> >> >> Iteration 16: loss 0.9184205380342829, diff 0.0
>>> >> >> Iteration 17: loss 0.8259870936519939, diff 0.1006438117513297
>>> >> >> Iteration 18: loss 0.8259870936519939, diff 0.0
>>> >> >> Iteration 19: loss 0.8259870936519939, diff 0.0
>>> >> >> Iteration 20: loss 0.6327447552109576, diff 0.233952934583647
>>> >> >> Iteration 21: loss 0.6327447552109576, diff 0.0
>>> >> >> Iteration 22: loss 0.6327447552109576, diff 0.0
>>> >> >> Iteration 23: loss 0.5534101162436362, diff 0.12538154276652747
>>> >> >> Iteration 24: loss 0.5534101162436362, diff 0.0
>>> >> >> Iteration 25: loss 0.5534101162436362, diff 0.0
>>> >> >> Iteration 26: loss 0.40450200866125635, diff 0.2690732137675816
>>> >> >> Iteration 27: loss 0.40450200866125635, diff 0.0
>>> >> >> Iteration 28: loss 0.40450200866125635, diff 0.0
>>> >> >> Iteration 29: loss 0.30788249908237314, diff 0.23885980452569502
>>> >> >>
>>> >> >> Sincerely,
>>> >> >>
>>> >> >> DB Tsai
>>> >> >> Machine Learning Engineer
>>> >> >> Alpine Data Labs
>>> >> >> --------------------------------------
>>> >> >> Web: http://alpinenow.com/
>>> >> >>
>>> >> >>
>>> >> >> On Wed, Mar 5, 2014 at 2:00 PM, David Hall <d...@cs.berkeley.edu>
>>> >> >> wrote:
>>> >> >> > On Wed, Mar 5, 2014 at 1:57 PM, DB Tsai <dbt...@alpinenow.com>
>>> >> >> > wrote:
>>> >> >> >
>>> >> >> >> Hi David,
>>> >> >> >>
>>> >> >> >> On Tue, Mar 4, 2014 at 8:13 PM, dlwh <david.lw.h...@gmail.com>
>>> >> >> >> wrote:
>>> >> >> >> > I'm happy to help fix any problems. I've verified at points
>>> >> >> >> > that
>>> >> >> >> > the
>>> >> >> >> > implementation gives the exact same sequence of iterates for a
>>> >> >> >> > few
>>> >> >> >> different
>>> >> >> >> > functions (with a particular line search) as the c port of
>>> >> >> >> > lbfgs.
>>> >> >> >> > So
>>> >> >> I'm
>>> >> >> >> a
>>> >> >> >> > little surprised it fails where Fortran succeeds... but only a
>>> >> >> >> > little.
>>> >> >> >> This
>>> >> >> >> > was fixed late last year.
>>> >> >> >> I'm working on a reproducible test case using breeze vs fortran
>>> >> >> >> implementation to show the problem I've run into. The test will
>>> >> >> >> be
>>> >> >> >> in
>>> >> >> >> one of the test cases in my Spark fork, is it okay for you to
>>> >> >> >> investigate the issue? Or do I need to make it as a standalone
>>> >> >> >> test?
>>> >> >> >>
>>> >> >> >
>>> >> >> >
>>> >> >> > Um, as long as it wouldn't be too hard to pull out.
>>> >> >> >
>>> >> >> >
>>> >> >> >>
>>> >> >> >> Will send you the test later today.
>>> >> >> >>
>>> >> >> >> Thanks.
>>> >> >> >>
>>> >> >> >> Sincerely,
>>> >> >> >>
>>> >> >> >> DB Tsai
>>> >> >> >> Machine Learning Engineer
>>> >> >> >> Alpine Data Labs
>>> >> >> >> --------------------------------------
>>> >> >> >> Web: http://alpinenow.com/
>>> >> >> >>
>>> >> >>
>>> >
>>> >
>>> >
>>
>>
>

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

Reply via email to