Yup that's what I expected...L-BFGS solver is in the master and gradient computation per RDD is done on each of the workers...
This miniBatchFraction is also a heuristic which I don't think makes sense for LogisticRegressionWithBFGS...does it ? On Tue, Apr 8, 2014 at 3:44 PM, DB Tsai <dbt...@stanford.edu> wrote: > Hi Debasish, > > The L-BFGS solver will be in the master like GD solver, and the part > that is parallelized is computing the gradient of each input row, and > summing them up. > > I prefer to make the optimizer plug-able instead of adding new > LogisticRegressionWithLBFGS since 98% of the code will be the same. > > Nice to have something like this, > > class LogisticRegression private ( > var optimizer: Optimizer) > extends GeneralizedLinearAlgorithm[LogisticRegressionModel] > > The following parameters will be setup in the optimizers, and they > should because they are part of optimization parameters. > > var stepSize: Double, > var numIterations: Int, > var regParam: Double, > var miniBatchFraction: Double > > Xiangrui, what do you think? > > For now, you can use my L-BFGS solver by copying and pasting the > LogisticRegressionWithSGD code, and changing the optimizer to L-BFGS. > > Sincerely, > > DB Tsai > ------------------------------------------------------- > My Blog: https://www.dbtsai.com > LinkedIn: https://www.linkedin.com/in/dbtsai > > > On Tue, Apr 8, 2014 at 9:42 AM, Debasish Das <debasish.da...@gmail.com> > wrote: > > Hi DB, > > > > Are we going to clean up the function: > > > > class LogisticRegressionWithSGD private ( > > var stepSize: Double, > > var numIterations: Int, > > var regParam: Double, > > var miniBatchFraction: Double) > > extends GeneralizedLinearAlgorithm[LogisticRegressionModel] with > > Serializable { > > > > val gradient = new LogisticGradient() > > val updater = new SimpleUpdater() > > override val optimizer = new GradientDescent(gradient, updater) > > > > Or add a new one ? > > > > class LogisticRegressionWithBFGS ? > > > > The WithABC is optional since optimizer could be picked up either based > on a > > flag...there are only 3 options for optimizor: > > > > 1. GradientDescent > > 2. Quasi Newton > > 3. Newton > > > > May be we add an enum for optimization type....and then under > > GradientDescent family people can add their variants of SGD....Not sure > if > > ConjugateGradient comes under 1 or 2....may be we need 4 options... > > > > Thanks. > > Deb > > > > > > On Mon, Apr 7, 2014 at 11:23 PM, Debasish Das <debasish.da...@gmail.com> > > wrote: > >> > >> I got your checkin....I need to run logistic regression SGD vs BFGS for > my > >> current usecases but your next checkin will update the logistic > regression > >> with LBFGS right ? Are you adding it to regression package as well ? > >> > >> Thanks. > >> Deb > >> > >> > >> On Mon, Apr 7, 2014 at 7:00 PM, DB Tsai <dbt...@stanford.edu> wrote: > >>> > >>> Hi guys, > >>> > >>> The latest PR uses Breeze's L-BFGS implement which is introduced by > >>> Xiangrui's sparse input format work in SPARK-1212. > >>> > >>> https://github.com/apache/spark/pull/353 > >>> > >>> Now, it works with the new sparse framework! > >>> > >>> Any feedback would be greatly appreciated. > >>> > >>> Thanks. > >>> > >>> Sincerely, > >>> > >>> DB Tsai > >>> ------------------------------------------------------- > >>> My Blog: https://www.dbtsai.com > >>> LinkedIn: https://www.linkedin.com/in/dbtsai > >>> > >>> > >>> On Thu, Apr 3, 2014 at 5:02 PM, DB Tsai <dbt...@alpinenow.com> wrote: > >>> > ---------- Forwarded message ---------- > >>> > From: David Hall <d...@cs.berkeley.edu> > >>> > Date: Sat, Mar 15, 2014 at 10:02 AM > >>> > Subject: Re: MLLib - Thoughts about refactoring Updater for LBFGS? > >>> > To: DB Tsai <dbt...@alpinenow.com> > >>> > > >>> > > >>> > On Fri, Mar 7, 2014 at 10:56 PM, DB Tsai <dbt...@alpinenow.com> > wrote: > >>> >> > >>> >> Hi David, > >>> >> > >>> >> Please let me know the version of Breeze that LBFGS can be > serialized, > >>> >> and CachedDiffFunction is built-in in LBFGS once you finish. I'll > >>> >> update the PR to Spark from using RISO implementation to Breeze > >>> >> implementation. > >>> > > >>> > > >>> > The current master (0.7-SNAPSHOT) has these problems fixed. > >>> > > >>> >> > >>> >> > >>> >> Thanks. > >>> >> > >>> >> Sincerely, > >>> >> > >>> >> DB Tsai > >>> >> Machine Learning Engineer > >>> >> Alpine Data Labs > >>> >> -------------------------------------- > >>> >> Web: http://alpinenow.com/ > >>> >> > >>> >> > >>> >> On Thu, Mar 6, 2014 at 4:26 PM, David Hall <d...@cs.berkeley.edu> > >>> >> wrote: > >>> >> > On Thu, Mar 6, 2014 at 4:21 PM, DB Tsai <dbt...@alpinenow.com> > >>> >> > wrote: > >>> >> > > >>> >> >> Hi David, > >>> >> >> > >>> >> >> I can converge to the same result with your breeze LBFGS and > >>> >> >> Fortran > >>> >> >> implementations now. Probably, I made some mistakes when I tried > >>> >> >> breeze before. I apologize that I claimed it's not stable. > >>> >> >> > >>> >> >> See the test case in BreezeLBFGSSuite.scala > >>> >> >> https://github.com/AlpineNow/spark/tree/dbtsai-breezeLBFGS > >>> >> >> > >>> >> >> This is training multinomial logistic regression against iris > >>> >> >> dataset, > >>> >> >> and both optimizers can train the models with 98% training > >>> >> >> accuracy. > >>> >> >> > >>> >> > > >>> >> > great to hear! There were some bugs in LBFGS about 6 months ago, > so > >>> >> > depending on the last time you tried it, it might indeed have been > >>> >> > bugged. > >>> >> > > >>> >> > > >>> >> >> > >>> >> >> There are two issues to use Breeze in Spark, > >>> >> >> > >>> >> >> 1) When the gradientSum and lossSum are computed distributively > in > >>> >> >> custom defined DiffFunction which will be passed into your > >>> >> >> optimizer, > >>> >> >> Spark will complain LBFGS class is not serializable. In > >>> >> >> BreezeLBFGS.scala, I've to convert RDD to array to make it work > >>> >> >> locally. It should be easy to fix by just having LBFGS to > implement > >>> >> >> Serializable. > >>> >> >> > >>> >> > > >>> >> > I'm not sure why Spark should be serializing LBFGS? Shouldn't it > >>> >> > live on > >>> >> > the controller node? Or is this a per-node thing? > >>> >> > > >>> >> > But no problem to make it serializable. > >>> >> > > >>> >> > > >>> >> >> > >>> >> >> 2) Breeze computes redundant gradient and loss. See the following > >>> >> >> log > >>> >> >> from both Fortran and Breeze implementations. > >>> >> >> > >>> >> > > >>> >> > Err, yeah. I should probably have LBFGS do this automatically, but > >>> >> > there's > >>> >> > a CachedDiffFunction that gets rid of the redundant calculations. > >>> >> > > >>> >> > -- David > >>> >> > > >>> >> > > >>> >> >> > >>> >> >> Thanks. > >>> >> >> > >>> >> >> Fortran: > >>> >> >> Iteration -1: loss 1.3862943611198926, diff 1.0 > >>> >> >> Iteration 0: loss 1.5846343143210866, diff 0.14307193024217352 > >>> >> >> Iteration 1: loss 1.1242501524477688, diff 0.29053004039012126 > >>> >> >> Iteration 2: loss 1.0930151243303563, diff 0.027782962952189336 > >>> >> >> Iteration 3: loss 1.054036932835569, diff 0.03566113127440601 > >>> >> >> Iteration 4: loss 0.9907956302751622, diff 0.05999907649459571 > >>> >> >> Iteration 5: loss 0.9184205380342829, diff 0.07304737423337761 > >>> >> >> Iteration 6: loss 0.8259870936519937, diff 0.10064381175132982 > >>> >> >> Iteration 7: loss 0.6327447552109574, diff 0.23395293458364716 > >>> >> >> Iteration 8: loss 0.5534101162436359, diff 0.1253815427665277 > >>> >> >> Iteration 9: loss 0.4045020086612566, diff 0.26907321376758075 > >>> >> >> Iteration 10: loss 0.3078824990823728, diff 0.23885980452569627 > >>> >> >> > >>> >> >> Breeze: > >>> >> >> Iteration -1: loss 1.3862943611198926, diff 1.0 > >>> >> >> Mar 6, 2014 3:59:11 PM com.github.fommil.netlib.BLAS <clinit> > >>> >> >> WARNING: Failed to load implementation from: > >>> >> >> com.github.fommil.netlib.NativeSystemBLAS > >>> >> >> Mar 6, 2014 3:59:11 PM com.github.fommil.netlib.BLAS <clinit> > >>> >> >> WARNING: Failed to load implementation from: > >>> >> >> com.github.fommil.netlib.NativeRefBLAS > >>> >> >> Iteration 0: loss 1.3862943611198926, diff 0.0 > >>> >> >> Iteration 1: loss 1.5846343143210866, diff 0.14307193024217352 > >>> >> >> Iteration 2: loss 1.1242501524477688, diff 0.29053004039012126 > >>> >> >> Iteration 3: loss 1.1242501524477688, diff 0.0 > >>> >> >> Iteration 4: loss 1.1242501524477688, diff 0.0 > >>> >> >> Iteration 5: loss 1.0930151243303563, diff 0.027782962952189336 > >>> >> >> Iteration 6: loss 1.0930151243303563, diff 0.0 > >>> >> >> Iteration 7: loss 1.0930151243303563, diff 0.0 > >>> >> >> Iteration 8: loss 1.054036932835569, diff 0.03566113127440601 > >>> >> >> Iteration 9: loss 1.054036932835569, diff 0.0 > >>> >> >> Iteration 10: loss 1.054036932835569, diff 0.0 > >>> >> >> Iteration 11: loss 0.9907956302751622, diff 0.05999907649459571 > >>> >> >> Iteration 12: loss 0.9907956302751622, diff 0.0 > >>> >> >> Iteration 13: loss 0.9907956302751622, diff 0.0 > >>> >> >> Iteration 14: loss 0.9184205380342829, diff 0.07304737423337761 > >>> >> >> Iteration 15: loss 0.9184205380342829, diff 0.0 > >>> >> >> Iteration 16: loss 0.9184205380342829, diff 0.0 > >>> >> >> Iteration 17: loss 0.8259870936519939, diff 0.1006438117513297 > >>> >> >> Iteration 18: loss 0.8259870936519939, diff 0.0 > >>> >> >> Iteration 19: loss 0.8259870936519939, diff 0.0 > >>> >> >> Iteration 20: loss 0.6327447552109576, diff 0.233952934583647 > >>> >> >> Iteration 21: loss 0.6327447552109576, diff 0.0 > >>> >> >> Iteration 22: loss 0.6327447552109576, diff 0.0 > >>> >> >> Iteration 23: loss 0.5534101162436362, diff 0.12538154276652747 > >>> >> >> Iteration 24: loss 0.5534101162436362, diff 0.0 > >>> >> >> Iteration 25: loss 0.5534101162436362, diff 0.0 > >>> >> >> Iteration 26: loss 0.40450200866125635, diff 0.2690732137675816 > >>> >> >> Iteration 27: loss 0.40450200866125635, diff 0.0 > >>> >> >> Iteration 28: loss 0.40450200866125635, diff 0.0 > >>> >> >> Iteration 29: loss 0.30788249908237314, diff 0.23885980452569502 > >>> >> >> > >>> >> >> Sincerely, > >>> >> >> > >>> >> >> DB Tsai > >>> >> >> Machine Learning Engineer > >>> >> >> Alpine Data Labs > >>> >> >> -------------------------------------- > >>> >> >> Web: http://alpinenow.com/ > >>> >> >> > >>> >> >> > >>> >> >> On Wed, Mar 5, 2014 at 2:00 PM, David Hall <d...@cs.berkeley.edu > > > >>> >> >> wrote: > >>> >> >> > On Wed, Mar 5, 2014 at 1:57 PM, DB Tsai <dbt...@alpinenow.com> > >>> >> >> > wrote: > >>> >> >> > > >>> >> >> >> Hi David, > >>> >> >> >> > >>> >> >> >> On Tue, Mar 4, 2014 at 8:13 PM, dlwh <david.lw.h...@gmail.com > > > >>> >> >> >> wrote: > >>> >> >> >> > I'm happy to help fix any problems. I've verified at points > >>> >> >> >> > that > >>> >> >> >> > the > >>> >> >> >> > implementation gives the exact same sequence of iterates > for a > >>> >> >> >> > few > >>> >> >> >> different > >>> >> >> >> > functions (with a particular line search) as the c port of > >>> >> >> >> > lbfgs. > >>> >> >> >> > So > >>> >> >> I'm > >>> >> >> >> a > >>> >> >> >> > little surprised it fails where Fortran succeeds... but > only a > >>> >> >> >> > little. > >>> >> >> >> This > >>> >> >> >> > was fixed late last year. > >>> >> >> >> I'm working on a reproducible test case using breeze vs > fortran > >>> >> >> >> implementation to show the problem I've run into. The test > will > >>> >> >> >> be > >>> >> >> >> in > >>> >> >> >> one of the test cases in my Spark fork, is it okay for you to > >>> >> >> >> investigate the issue? Or do I need to make it as a standalone > >>> >> >> >> test? > >>> >> >> >> > >>> >> >> > > >>> >> >> > > >>> >> >> > Um, as long as it wouldn't be too hard to pull out. > >>> >> >> > > >>> >> >> > > >>> >> >> >> > >>> >> >> >> Will send you the test later today. > >>> >> >> >> > >>> >> >> >> Thanks. > >>> >> >> >> > >>> >> >> >> Sincerely, > >>> >> >> >> > >>> >> >> >> DB Tsai > >>> >> >> >> Machine Learning Engineer > >>> >> >> >> Alpine Data Labs > >>> >> >> >> -------------------------------------- > >>> >> >> >> Web: http://alpinenow.com/ > >>> >> >> >> > >>> >> >> > >>> > > >>> > > >>> > > >> > >> > > >