Re: MLLib - Thoughts about refactoring Updater for LBFGS?

DB Tsai Tue, 08 Apr 2014 16:49:34 -0700

I don't experiment it. That's the use-case in theory I could think of. ^^

However, from what I saw, BFGS converges really fast so that I only
need 20~30 iterations in general.


Sincerely,

DB Tsai
-------------------------------------------------------
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai


On Tue, Apr 8, 2014 at 4:45 PM, Debasish Das <debasish.da...@gmail.com> wrote:
> Have you experimented with it ? For logistic regression at least given
> enough iterations/tolerance that you are giving, BFGS in both ways should
> converge to same solution....
>
>
> On Tue, Apr 8, 2014 at 4:19 PM, DB Tsai <dbt...@stanford.edu> wrote:
>>
>> I think mini batch is still useful for L-BFGS.
>>
>> One of the use-cases can be initialized the weights by training with
>> the smaller subsamples of data using mini batch with L-BFGS.
>>
>> Then we could use the weights trained with mini batch to start another
>> training process with full data.
>>
>> Sincerely,
>>
>> DB Tsai
>> -------------------------------------------------------
>> My Blog: https://www.dbtsai.com
>> LinkedIn: https://www.linkedin.com/in/dbtsai
>>
>>
>> On Tue, Apr 8, 2014 at 4:05 PM, Debasish Das <debasish.da...@gmail.com>
>> wrote:
>> > Yup that's what I expected...L-BFGS solver is in the master and gradient
>> > computation per RDD is done on each of the workers...
>> >
>> > This miniBatchFraction is also a heuristic which I don't think makes
>> > sense
>> > for LogisticRegressionWithBFGS...does it ?
>> >
>> >
>> > On Tue, Apr 8, 2014 at 3:44 PM, DB Tsai <dbt...@stanford.edu> wrote:
>> >>
>> >> Hi Debasish,
>> >>
>> >> The L-BFGS solver will be in the master like GD solver, and the part
>> >> that is parallelized is computing the gradient of each input row, and
>> >> summing them up.
>> >>
>> >> I prefer to make the optimizer plug-able instead of adding new
>> >> LogisticRegressionWithLBFGS since 98% of the code will be the same.
>> >>
>> >> Nice to have something like this,
>> >>
>> >> class LogisticRegression private (
>> >>     var optimizer: Optimizer)
>> >>   extends GeneralizedLinearAlgorithm[LogisticRegressionModel]
>> >>
>> >> The following parameters will be setup in the optimizers, and they
>> >> should because they are part of optimization parameters.
>> >>
>> >>     var stepSize: Double,
>> >>     var numIterations: Int,
>> >>     var regParam: Double,
>> >>     var miniBatchFraction: Double
>> >>
>> >> Xiangrui, what do you think?
>> >>
>> >> For now, you can use my L-BFGS solver by copying and pasting the
>> >> LogisticRegressionWithSGD code, and changing the optimizer to L-BFGS.
>> >>
>> >> Sincerely,
>> >>
>> >> DB Tsai
>> >> -------------------------------------------------------
>> >> My Blog: https://www.dbtsai.com
>> >> LinkedIn: https://www.linkedin.com/in/dbtsai
>> >>
>> >>
>> >> On Tue, Apr 8, 2014 at 9:42 AM, Debasish Das <debasish.da...@gmail.com>
>> >> wrote:
>> >> > Hi DB,
>> >> >
>> >> > Are we going to clean up the function:
>> >> >
>> >> > class LogisticRegressionWithSGD private (
>> >> >     var stepSize: Double,
>> >> >     var numIterations: Int,
>> >> >     var regParam: Double,
>> >> >     var miniBatchFraction: Double)
>> >> >   extends GeneralizedLinearAlgorithm[LogisticRegressionModel] with
>> >> > Serializable {
>> >> >
>> >> >   val gradient = new LogisticGradient()
>> >> >   val updater = new SimpleUpdater()
>> >> >   override val optimizer = new GradientDescent(gradient, updater)
>> >> >
>> >> > Or add a new one ?
>> >> >
>> >> > class LogisticRegressionWithBFGS ?
>> >> >
>> >> > The WithABC is optional since optimizer could be picked up either
>> >> > based
>> >> > on a
>> >> > flag...there are only 3 options for optimizor:
>> >> >
>> >> > 1. GradientDescent
>> >> > 2. Quasi Newton
>> >> > 3. Newton
>> >> >
>> >> > May be we add an enum for optimization type....and then under
>> >> > GradientDescent family people can add their variants of SGD....Not
>> >> > sure
>> >> > if
>> >> > ConjugateGradient comes under 1 or 2....may be we need 4 options...
>> >> >
>> >> > Thanks.
>> >> > Deb
>> >> >
>> >> >
>> >> > On Mon, Apr 7, 2014 at 11:23 PM, Debasish Das
>> >> > <debasish.da...@gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> I got your checkin....I need to run logistic regression SGD vs BFGS
>> >> >> for
>> >> >> my
>> >> >> current usecases but your next checkin will update the logistic
>> >> >> regression
>> >> >> with LBFGS right ? Are you adding it to regression package as well ?
>> >> >>
>> >> >> Thanks.
>> >> >> Deb
>> >> >>
>> >> >>
>> >> >> On Mon, Apr 7, 2014 at 7:00 PM, DB Tsai <dbt...@stanford.edu> wrote:
>> >> >>>
>> >> >>> Hi guys,
>> >> >>>
>> >> >>> The latest PR uses Breeze's L-BFGS implement which is introduced by
>> >> >>> Xiangrui's sparse input format work in SPARK-1212.
>> >> >>>
>> >> >>> https://github.com/apache/spark/pull/353
>> >> >>>
>> >> >>> Now, it works with the new sparse framework!
>> >> >>>
>> >> >>> Any feedback would be greatly appreciated.
>> >> >>>
>> >> >>> Thanks.
>> >> >>>
>> >> >>> Sincerely,
>> >> >>>
>> >> >>> DB Tsai
>> >> >>> -------------------------------------------------------
>> >> >>> My Blog: https://www.dbtsai.com
>> >> >>> LinkedIn: https://www.linkedin.com/in/dbtsai
>> >> >>>
>> >> >>>
>> >> >>> On Thu, Apr 3, 2014 at 5:02 PM, DB Tsai <dbt...@alpinenow.com>
>> >> >>> wrote:
>> >> >>> > ---------- Forwarded message ----------
>> >> >>> > From: David Hall <d...@cs.berkeley.edu>
>> >> >>> > Date: Sat, Mar 15, 2014 at 10:02 AM
>> >> >>> > Subject: Re: MLLib - Thoughts about refactoring Updater for
>> >> >>> > LBFGS?
>> >> >>> > To: DB Tsai <dbt...@alpinenow.com>
>> >> >>> >
>> >> >>> >
>> >> >>> > On Fri, Mar 7, 2014 at 10:56 PM, DB Tsai <dbt...@alpinenow.com>
>> >> >>> > wrote:
>> >> >>> >>
>> >> >>> >> Hi David,
>> >> >>> >>
>> >> >>> >> Please let me know the version of Breeze that LBFGS can be
>> >> >>> >> serialized,
>> >> >>> >> and CachedDiffFunction is built-in in LBFGS once you finish.
>> >> >>> >> I'll
>> >> >>> >> update the PR to Spark from using RISO implementation to Breeze
>> >> >>> >> implementation.
>> >> >>> >
>> >> >>> >
>> >> >>> > The current master (0.7-SNAPSHOT) has these problems fixed.
>> >> >>> >
>> >> >>> >>
>> >> >>> >>
>> >> >>> >> Thanks.
>> >> >>> >>
>> >> >>> >> Sincerely,
>> >> >>> >>
>> >> >>> >> DB Tsai
>> >> >>> >> Machine Learning Engineer
>> >> >>> >> Alpine Data Labs
>> >> >>> >> --------------------------------------
>> >> >>> >> Web: http://alpinenow.com/
>> >> >>> >>
>> >> >>> >>
>> >> >>> >> On Thu, Mar 6, 2014 at 4:26 PM, David Hall
>> >> >>> >> <d...@cs.berkeley.edu>
>> >> >>> >> wrote:
>> >> >>> >> > On Thu, Mar 6, 2014 at 4:21 PM, DB Tsai <dbt...@alpinenow.com>
>> >> >>> >> > wrote:
>> >> >>> >> >
>> >> >>> >> >> Hi David,
>> >> >>> >> >>
>> >> >>> >> >> I can converge to the same result with your breeze LBFGS and
>> >> >>> >> >> Fortran
>> >> >>> >> >> implementations now. Probably, I made some mistakes when I
>> >> >>> >> >> tried
>> >> >>> >> >> breeze before. I apologize that I claimed it's not stable.
>> >> >>> >> >>
>> >> >>> >> >> See the test case in BreezeLBFGSSuite.scala
>> >> >>> >> >> https://github.com/AlpineNow/spark/tree/dbtsai-breezeLBFGS
>> >> >>> >> >>
>> >> >>> >> >> This is training multinomial logistic regression against iris
>> >> >>> >> >> dataset,
>> >> >>> >> >> and both optimizers can train the models with 98% training
>> >> >>> >> >> accuracy.
>> >> >>> >> >>
>> >> >>> >> >
>> >> >>> >> > great to hear! There were some bugs in LBFGS about 6 months
>> >> >>> >> > ago,
>> >> >>> >> > so
>> >> >>> >> > depending on the last time you tried it, it might indeed have
>> >> >>> >> > been
>> >> >>> >> > bugged.
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> >>
>> >> >>> >> >> There are two issues to use Breeze in Spark,
>> >> >>> >> >>
>> >> >>> >> >> 1) When the gradientSum and lossSum are computed
>> >> >>> >> >> distributively
>> >> >>> >> >> in
>> >> >>> >> >> custom defined DiffFunction which will be passed into your
>> >> >>> >> >> optimizer,
>> >> >>> >> >> Spark will complain LBFGS class is not serializable. In
>> >> >>> >> >> BreezeLBFGS.scala, I've to convert RDD to array to make it
>> >> >>> >> >> work
>> >> >>> >> >> locally. It should be easy to fix by just having LBFGS to
>> >> >>> >> >> implement
>> >> >>> >> >> Serializable.
>> >> >>> >> >>
>> >> >>> >> >
>> >> >>> >> > I'm not sure why Spark should be serializing LBFGS? Shouldn't
>> >> >>> >> > it
>> >> >>> >> > live on
>> >> >>> >> > the controller node? Or is this a per-node thing?
>> >> >>> >> >
>> >> >>> >> > But no problem to make it serializable.
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> >>
>> >> >>> >> >> 2) Breeze computes redundant gradient and loss. See the
>> >> >>> >> >> following
>> >> >>> >> >> log
>> >> >>> >> >> from both Fortran and Breeze implementations.
>> >> >>> >> >>
>> >> >>> >> >
>> >> >>> >> > Err, yeah. I should probably have LBFGS do this automatically,
>> >> >>> >> > but
>> >> >>> >> > there's
>> >> >>> >> > a CachedDiffFunction that gets rid of the redundant
>> >> >>> >> > calculations.
>> >> >>> >> >
>> >> >>> >> > -- David
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> >>
>> >> >>> >> >> Thanks.
>> >> >>> >> >>
>> >> >>> >> >> Fortran:
>> >> >>> >> >> Iteration -1: loss 1.3862943611198926, diff 1.0
>> >> >>> >> >> Iteration 0: loss 1.5846343143210866, diff
>> >> >>> >> >> 0.14307193024217352
>> >> >>> >> >> Iteration 1: loss 1.1242501524477688, diff
>> >> >>> >> >> 0.29053004039012126
>> >> >>> >> >> Iteration 2: loss 1.0930151243303563, diff
>> >> >>> >> >> 0.027782962952189336
>> >> >>> >> >> Iteration 3: loss 1.054036932835569, diff 0.03566113127440601
>> >> >>> >> >> Iteration 4: loss 0.9907956302751622, diff
>> >> >>> >> >> 0.05999907649459571
>> >> >>> >> >> Iteration 5: loss 0.9184205380342829, diff
>> >> >>> >> >> 0.07304737423337761
>> >> >>> >> >> Iteration 6: loss 0.8259870936519937, diff
>> >> >>> >> >> 0.10064381175132982
>> >> >>> >> >> Iteration 7: loss 0.6327447552109574, diff
>> >> >>> >> >> 0.23395293458364716
>> >> >>> >> >> Iteration 8: loss 0.5534101162436359, diff 0.1253815427665277
>> >> >>> >> >> Iteration 9: loss 0.4045020086612566, diff
>> >> >>> >> >> 0.26907321376758075
>> >> >>> >> >> Iteration 10: loss 0.3078824990823728, diff
>> >> >>> >> >> 0.23885980452569627
>> >> >>> >> >>
>> >> >>> >> >> Breeze:
>> >> >>> >> >> Iteration -1: loss 1.3862943611198926, diff 1.0
>> >> >>> >> >> Mar 6, 2014 3:59:11 PM com.github.fommil.netlib.BLAS <clinit>
>> >> >>> >> >> WARNING: Failed to load implementation from:
>> >> >>> >> >> com.github.fommil.netlib.NativeSystemBLAS
>> >> >>> >> >> Mar 6, 2014 3:59:11 PM com.github.fommil.netlib.BLAS <clinit>
>> >> >>> >> >> WARNING: Failed to load implementation from:
>> >> >>> >> >> com.github.fommil.netlib.NativeRefBLAS
>> >> >>> >> >> Iteration 0: loss 1.3862943611198926, diff 0.0
>> >> >>> >> >> Iteration 1: loss 1.5846343143210866, diff
>> >> >>> >> >> 0.14307193024217352
>> >> >>> >> >> Iteration 2: loss 1.1242501524477688, diff
>> >> >>> >> >> 0.29053004039012126
>> >> >>> >> >> Iteration 3: loss 1.1242501524477688, diff 0.0
>> >> >>> >> >> Iteration 4: loss 1.1242501524477688, diff 0.0
>> >> >>> >> >> Iteration 5: loss 1.0930151243303563, diff
>> >> >>> >> >> 0.027782962952189336
>> >> >>> >> >> Iteration 6: loss 1.0930151243303563, diff 0.0
>> >> >>> >> >> Iteration 7: loss 1.0930151243303563, diff 0.0
>> >> >>> >> >> Iteration 8: loss 1.054036932835569, diff 0.03566113127440601
>> >> >>> >> >> Iteration 9: loss 1.054036932835569, diff 0.0
>> >> >>> >> >> Iteration 10: loss 1.054036932835569, diff 0.0
>> >> >>> >> >> Iteration 11: loss 0.9907956302751622, diff
>> >> >>> >> >> 0.05999907649459571
>> >> >>> >> >> Iteration 12: loss 0.9907956302751622, diff 0.0
>> >> >>> >> >> Iteration 13: loss 0.9907956302751622, diff 0.0
>> >> >>> >> >> Iteration 14: loss 0.9184205380342829, diff
>> >> >>> >> >> 0.07304737423337761
>> >> >>> >> >> Iteration 15: loss 0.9184205380342829, diff 0.0
>> >> >>> >> >> Iteration 16: loss 0.9184205380342829, diff 0.0
>> >> >>> >> >> Iteration 17: loss 0.8259870936519939, diff
>> >> >>> >> >> 0.1006438117513297
>> >> >>> >> >> Iteration 18: loss 0.8259870936519939, diff 0.0
>> >> >>> >> >> Iteration 19: loss 0.8259870936519939, diff 0.0
>> >> >>> >> >> Iteration 20: loss 0.6327447552109576, diff 0.233952934583647
>> >> >>> >> >> Iteration 21: loss 0.6327447552109576, diff 0.0
>> >> >>> >> >> Iteration 22: loss 0.6327447552109576, diff 0.0
>> >> >>> >> >> Iteration 23: loss 0.5534101162436362, diff
>> >> >>> >> >> 0.12538154276652747
>> >> >>> >> >> Iteration 24: loss 0.5534101162436362, diff 0.0
>> >> >>> >> >> Iteration 25: loss 0.5534101162436362, diff 0.0
>> >> >>> >> >> Iteration 26: loss 0.40450200866125635, diff
>> >> >>> >> >> 0.2690732137675816
>> >> >>> >> >> Iteration 27: loss 0.40450200866125635, diff 0.0
>> >> >>> >> >> Iteration 28: loss 0.40450200866125635, diff 0.0
>> >> >>> >> >> Iteration 29: loss 0.30788249908237314, diff
>> >> >>> >> >> 0.23885980452569502
>> >> >>> >> >>
>> >> >>> >> >> Sincerely,
>> >> >>> >> >>
>> >> >>> >> >> DB Tsai
>> >> >>> >> >> Machine Learning Engineer
>> >> >>> >> >> Alpine Data Labs
>> >> >>> >> >> --------------------------------------
>> >> >>> >> >> Web: http://alpinenow.com/
>> >> >>> >> >>
>> >> >>> >> >>
>> >> >>> >> >> On Wed, Mar 5, 2014 at 2:00 PM, David Hall
>> >> >>> >> >> <d...@cs.berkeley.edu>
>> >> >>> >> >> wrote:
>> >> >>> >> >> > On Wed, Mar 5, 2014 at 1:57 PM, DB Tsai
>> >> >>> >> >> > <dbt...@alpinenow.com>
>> >> >>> >> >> > wrote:
>> >> >>> >> >> >
>> >> >>> >> >> >> Hi David,
>> >> >>> >> >> >>
>> >> >>> >> >> >> On Tue, Mar 4, 2014 at 8:13 PM, dlwh
>> >> >>> >> >> >> <david.lw.h...@gmail.com>
>> >> >>> >> >> >> wrote:
>> >> >>> >> >> >> > I'm happy to help fix any problems. I've verified at
>> >> >>> >> >> >> > points
>> >> >>> >> >> >> > that
>> >> >>> >> >> >> > the
>> >> >>> >> >> >> > implementation gives the exact same sequence of iterates
>> >> >>> >> >> >> > for a
>> >> >>> >> >> >> > few
>> >> >>> >> >> >> different
>> >> >>> >> >> >> > functions (with a particular line search) as the c port
>> >> >>> >> >> >> > of
>> >> >>> >> >> >> > lbfgs.
>> >> >>> >> >> >> > So
>> >> >>> >> >> I'm
>> >> >>> >> >> >> a
>> >> >>> >> >> >> > little surprised it fails where Fortran succeeds... but
>> >> >>> >> >> >> > only a
>> >> >>> >> >> >> > little.
>> >> >>> >> >> >> This
>> >> >>> >> >> >> > was fixed late last year.
>> >> >>> >> >> >> I'm working on a reproducible test case using breeze vs
>> >> >>> >> >> >> fortran
>> >> >>> >> >> >> implementation to show the problem I've run into. The test
>> >> >>> >> >> >> will
>> >> >>> >> >> >> be
>> >> >>> >> >> >> in
>> >> >>> >> >> >> one of the test cases in my Spark fork, is it okay for you
>> >> >>> >> >> >> to
>> >> >>> >> >> >> investigate the issue? Or do I need to make it as a
>> >> >>> >> >> >> standalone
>> >> >>> >> >> >> test?
>> >> >>> >> >> >>
>> >> >>> >> >> >
>> >> >>> >> >> >
>> >> >>> >> >> > Um, as long as it wouldn't be too hard to pull out.
>> >> >>> >> >> >
>> >> >>> >> >> >
>> >> >>> >> >> >>
>> >> >>> >> >> >> Will send you the test later today.
>> >> >>> >> >> >>
>> >> >>> >> >> >> Thanks.
>> >> >>> >> >> >>
>> >> >>> >> >> >> Sincerely,
>> >> >>> >> >> >>
>> >> >>> >> >> >> DB Tsai
>> >> >>> >> >> >> Machine Learning Engineer
>> >> >>> >> >> >> Alpine Data Labs
>> >> >>> >> >> >> --------------------------------------
>> >> >>> >> >> >> Web: http://alpinenow.com/
>> >> >>> >> >> >>
>> >> >>> >> >>
>> >> >>> >
>> >> >>> >
>> >> >>> >
>> >> >>
>> >> >>
>> >> >
>> >
>> >
>
>

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

Reply via email to