I don't experiment it. That's the use-case in theory I could think of. ^^ However, from what I saw, BFGS converges really fast so that I only need 20~30 iterations in general.
Sincerely, DB Tsai ------------------------------------------------------- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Tue, Apr 8, 2014 at 4:45 PM, Debasish Das <debasish.da...@gmail.com> wrote: > Have you experimented with it ? For logistic regression at least given > enough iterations/tolerance that you are giving, BFGS in both ways should > converge to same solution.... > > > On Tue, Apr 8, 2014 at 4:19 PM, DB Tsai <dbt...@stanford.edu> wrote: >> >> I think mini batch is still useful for L-BFGS. >> >> One of the use-cases can be initialized the weights by training with >> the smaller subsamples of data using mini batch with L-BFGS. >> >> Then we could use the weights trained with mini batch to start another >> training process with full data. >> >> Sincerely, >> >> DB Tsai >> ------------------------------------------------------- >> My Blog: https://www.dbtsai.com >> LinkedIn: https://www.linkedin.com/in/dbtsai >> >> >> On Tue, Apr 8, 2014 at 4:05 PM, Debasish Das <debasish.da...@gmail.com> >> wrote: >> > Yup that's what I expected...L-BFGS solver is in the master and gradient >> > computation per RDD is done on each of the workers... >> > >> > This miniBatchFraction is also a heuristic which I don't think makes >> > sense >> > for LogisticRegressionWithBFGS...does it ? >> > >> > >> > On Tue, Apr 8, 2014 at 3:44 PM, DB Tsai <dbt...@stanford.edu> wrote: >> >> >> >> Hi Debasish, >> >> >> >> The L-BFGS solver will be in the master like GD solver, and the part >> >> that is parallelized is computing the gradient of each input row, and >> >> summing them up. >> >> >> >> I prefer to make the optimizer plug-able instead of adding new >> >> LogisticRegressionWithLBFGS since 98% of the code will be the same. >> >> >> >> Nice to have something like this, >> >> >> >> class LogisticRegression private ( >> >> var optimizer: Optimizer) >> >> extends GeneralizedLinearAlgorithm[LogisticRegressionModel] >> >> >> >> The following parameters will be setup in the optimizers, and they >> >> should because they are part of optimization parameters. >> >> >> >> var stepSize: Double, >> >> var numIterations: Int, >> >> var regParam: Double, >> >> var miniBatchFraction: Double >> >> >> >> Xiangrui, what do you think? >> >> >> >> For now, you can use my L-BFGS solver by copying and pasting the >> >> LogisticRegressionWithSGD code, and changing the optimizer to L-BFGS. >> >> >> >> Sincerely, >> >> >> >> DB Tsai >> >> ------------------------------------------------------- >> >> My Blog: https://www.dbtsai.com >> >> LinkedIn: https://www.linkedin.com/in/dbtsai >> >> >> >> >> >> On Tue, Apr 8, 2014 at 9:42 AM, Debasish Das <debasish.da...@gmail.com> >> >> wrote: >> >> > Hi DB, >> >> > >> >> > Are we going to clean up the function: >> >> > >> >> > class LogisticRegressionWithSGD private ( >> >> > var stepSize: Double, >> >> > var numIterations: Int, >> >> > var regParam: Double, >> >> > var miniBatchFraction: Double) >> >> > extends GeneralizedLinearAlgorithm[LogisticRegressionModel] with >> >> > Serializable { >> >> > >> >> > val gradient = new LogisticGradient() >> >> > val updater = new SimpleUpdater() >> >> > override val optimizer = new GradientDescent(gradient, updater) >> >> > >> >> > Or add a new one ? >> >> > >> >> > class LogisticRegressionWithBFGS ? >> >> > >> >> > The WithABC is optional since optimizer could be picked up either >> >> > based >> >> > on a >> >> > flag...there are only 3 options for optimizor: >> >> > >> >> > 1. GradientDescent >> >> > 2. Quasi Newton >> >> > 3. Newton >> >> > >> >> > May be we add an enum for optimization type....and then under >> >> > GradientDescent family people can add their variants of SGD....Not >> >> > sure >> >> > if >> >> > ConjugateGradient comes under 1 or 2....may be we need 4 options... >> >> > >> >> > Thanks. >> >> > Deb >> >> > >> >> > >> >> > On Mon, Apr 7, 2014 at 11:23 PM, Debasish Das >> >> > <debasish.da...@gmail.com> >> >> > wrote: >> >> >> >> >> >> I got your checkin....I need to run logistic regression SGD vs BFGS >> >> >> for >> >> >> my >> >> >> current usecases but your next checkin will update the logistic >> >> >> regression >> >> >> with LBFGS right ? Are you adding it to regression package as well ? >> >> >> >> >> >> Thanks. >> >> >> Deb >> >> >> >> >> >> >> >> >> On Mon, Apr 7, 2014 at 7:00 PM, DB Tsai <dbt...@stanford.edu> wrote: >> >> >>> >> >> >>> Hi guys, >> >> >>> >> >> >>> The latest PR uses Breeze's L-BFGS implement which is introduced by >> >> >>> Xiangrui's sparse input format work in SPARK-1212. >> >> >>> >> >> >>> https://github.com/apache/spark/pull/353 >> >> >>> >> >> >>> Now, it works with the new sparse framework! >> >> >>> >> >> >>> Any feedback would be greatly appreciated. >> >> >>> >> >> >>> Thanks. >> >> >>> >> >> >>> Sincerely, >> >> >>> >> >> >>> DB Tsai >> >> >>> ------------------------------------------------------- >> >> >>> My Blog: https://www.dbtsai.com >> >> >>> LinkedIn: https://www.linkedin.com/in/dbtsai >> >> >>> >> >> >>> >> >> >>> On Thu, Apr 3, 2014 at 5:02 PM, DB Tsai <dbt...@alpinenow.com> >> >> >>> wrote: >> >> >>> > ---------- Forwarded message ---------- >> >> >>> > From: David Hall <d...@cs.berkeley.edu> >> >> >>> > Date: Sat, Mar 15, 2014 at 10:02 AM >> >> >>> > Subject: Re: MLLib - Thoughts about refactoring Updater for >> >> >>> > LBFGS? >> >> >>> > To: DB Tsai <dbt...@alpinenow.com> >> >> >>> > >> >> >>> > >> >> >>> > On Fri, Mar 7, 2014 at 10:56 PM, DB Tsai <dbt...@alpinenow.com> >> >> >>> > wrote: >> >> >>> >> >> >> >>> >> Hi David, >> >> >>> >> >> >> >>> >> Please let me know the version of Breeze that LBFGS can be >> >> >>> >> serialized, >> >> >>> >> and CachedDiffFunction is built-in in LBFGS once you finish. >> >> >>> >> I'll >> >> >>> >> update the PR to Spark from using RISO implementation to Breeze >> >> >>> >> implementation. >> >> >>> > >> >> >>> > >> >> >>> > The current master (0.7-SNAPSHOT) has these problems fixed. >> >> >>> > >> >> >>> >> >> >> >>> >> >> >> >>> >> Thanks. >> >> >>> >> >> >> >>> >> Sincerely, >> >> >>> >> >> >> >>> >> DB Tsai >> >> >>> >> Machine Learning Engineer >> >> >>> >> Alpine Data Labs >> >> >>> >> -------------------------------------- >> >> >>> >> Web: http://alpinenow.com/ >> >> >>> >> >> >> >>> >> >> >> >>> >> On Thu, Mar 6, 2014 at 4:26 PM, David Hall >> >> >>> >> <d...@cs.berkeley.edu> >> >> >>> >> wrote: >> >> >>> >> > On Thu, Mar 6, 2014 at 4:21 PM, DB Tsai <dbt...@alpinenow.com> >> >> >>> >> > wrote: >> >> >>> >> > >> >> >>> >> >> Hi David, >> >> >>> >> >> >> >> >>> >> >> I can converge to the same result with your breeze LBFGS and >> >> >>> >> >> Fortran >> >> >>> >> >> implementations now. Probably, I made some mistakes when I >> >> >>> >> >> tried >> >> >>> >> >> breeze before. I apologize that I claimed it's not stable. >> >> >>> >> >> >> >> >>> >> >> See the test case in BreezeLBFGSSuite.scala >> >> >>> >> >> https://github.com/AlpineNow/spark/tree/dbtsai-breezeLBFGS >> >> >>> >> >> >> >> >>> >> >> This is training multinomial logistic regression against iris >> >> >>> >> >> dataset, >> >> >>> >> >> and both optimizers can train the models with 98% training >> >> >>> >> >> accuracy. >> >> >>> >> >> >> >> >>> >> > >> >> >>> >> > great to hear! There were some bugs in LBFGS about 6 months >> >> >>> >> > ago, >> >> >>> >> > so >> >> >>> >> > depending on the last time you tried it, it might indeed have >> >> >>> >> > been >> >> >>> >> > bugged. >> >> >>> >> > >> >> >>> >> > >> >> >>> >> >> >> >> >>> >> >> There are two issues to use Breeze in Spark, >> >> >>> >> >> >> >> >>> >> >> 1) When the gradientSum and lossSum are computed >> >> >>> >> >> distributively >> >> >>> >> >> in >> >> >>> >> >> custom defined DiffFunction which will be passed into your >> >> >>> >> >> optimizer, >> >> >>> >> >> Spark will complain LBFGS class is not serializable. In >> >> >>> >> >> BreezeLBFGS.scala, I've to convert RDD to array to make it >> >> >>> >> >> work >> >> >>> >> >> locally. It should be easy to fix by just having LBFGS to >> >> >>> >> >> implement >> >> >>> >> >> Serializable. >> >> >>> >> >> >> >> >>> >> > >> >> >>> >> > I'm not sure why Spark should be serializing LBFGS? Shouldn't >> >> >>> >> > it >> >> >>> >> > live on >> >> >>> >> > the controller node? Or is this a per-node thing? >> >> >>> >> > >> >> >>> >> > But no problem to make it serializable. >> >> >>> >> > >> >> >>> >> > >> >> >>> >> >> >> >> >>> >> >> 2) Breeze computes redundant gradient and loss. See the >> >> >>> >> >> following >> >> >>> >> >> log >> >> >>> >> >> from both Fortran and Breeze implementations. >> >> >>> >> >> >> >> >>> >> > >> >> >>> >> > Err, yeah. I should probably have LBFGS do this automatically, >> >> >>> >> > but >> >> >>> >> > there's >> >> >>> >> > a CachedDiffFunction that gets rid of the redundant >> >> >>> >> > calculations. >> >> >>> >> > >> >> >>> >> > -- David >> >> >>> >> > >> >> >>> >> > >> >> >>> >> >> >> >> >>> >> >> Thanks. >> >> >>> >> >> >> >> >>> >> >> Fortran: >> >> >>> >> >> Iteration -1: loss 1.3862943611198926, diff 1.0 >> >> >>> >> >> Iteration 0: loss 1.5846343143210866, diff >> >> >>> >> >> 0.14307193024217352 >> >> >>> >> >> Iteration 1: loss 1.1242501524477688, diff >> >> >>> >> >> 0.29053004039012126 >> >> >>> >> >> Iteration 2: loss 1.0930151243303563, diff >> >> >>> >> >> 0.027782962952189336 >> >> >>> >> >> Iteration 3: loss 1.054036932835569, diff 0.03566113127440601 >> >> >>> >> >> Iteration 4: loss 0.9907956302751622, diff >> >> >>> >> >> 0.05999907649459571 >> >> >>> >> >> Iteration 5: loss 0.9184205380342829, diff >> >> >>> >> >> 0.07304737423337761 >> >> >>> >> >> Iteration 6: loss 0.8259870936519937, diff >> >> >>> >> >> 0.10064381175132982 >> >> >>> >> >> Iteration 7: loss 0.6327447552109574, diff >> >> >>> >> >> 0.23395293458364716 >> >> >>> >> >> Iteration 8: loss 0.5534101162436359, diff 0.1253815427665277 >> >> >>> >> >> Iteration 9: loss 0.4045020086612566, diff >> >> >>> >> >> 0.26907321376758075 >> >> >>> >> >> Iteration 10: loss 0.3078824990823728, diff >> >> >>> >> >> 0.23885980452569627 >> >> >>> >> >> >> >> >>> >> >> Breeze: >> >> >>> >> >> Iteration -1: loss 1.3862943611198926, diff 1.0 >> >> >>> >> >> Mar 6, 2014 3:59:11 PM com.github.fommil.netlib.BLAS <clinit> >> >> >>> >> >> WARNING: Failed to load implementation from: >> >> >>> >> >> com.github.fommil.netlib.NativeSystemBLAS >> >> >>> >> >> Mar 6, 2014 3:59:11 PM com.github.fommil.netlib.BLAS <clinit> >> >> >>> >> >> WARNING: Failed to load implementation from: >> >> >>> >> >> com.github.fommil.netlib.NativeRefBLAS >> >> >>> >> >> Iteration 0: loss 1.3862943611198926, diff 0.0 >> >> >>> >> >> Iteration 1: loss 1.5846343143210866, diff >> >> >>> >> >> 0.14307193024217352 >> >> >>> >> >> Iteration 2: loss 1.1242501524477688, diff >> >> >>> >> >> 0.29053004039012126 >> >> >>> >> >> Iteration 3: loss 1.1242501524477688, diff 0.0 >> >> >>> >> >> Iteration 4: loss 1.1242501524477688, diff 0.0 >> >> >>> >> >> Iteration 5: loss 1.0930151243303563, diff >> >> >>> >> >> 0.027782962952189336 >> >> >>> >> >> Iteration 6: loss 1.0930151243303563, diff 0.0 >> >> >>> >> >> Iteration 7: loss 1.0930151243303563, diff 0.0 >> >> >>> >> >> Iteration 8: loss 1.054036932835569, diff 0.03566113127440601 >> >> >>> >> >> Iteration 9: loss 1.054036932835569, diff 0.0 >> >> >>> >> >> Iteration 10: loss 1.054036932835569, diff 0.0 >> >> >>> >> >> Iteration 11: loss 0.9907956302751622, diff >> >> >>> >> >> 0.05999907649459571 >> >> >>> >> >> Iteration 12: loss 0.9907956302751622, diff 0.0 >> >> >>> >> >> Iteration 13: loss 0.9907956302751622, diff 0.0 >> >> >>> >> >> Iteration 14: loss 0.9184205380342829, diff >> >> >>> >> >> 0.07304737423337761 >> >> >>> >> >> Iteration 15: loss 0.9184205380342829, diff 0.0 >> >> >>> >> >> Iteration 16: loss 0.9184205380342829, diff 0.0 >> >> >>> >> >> Iteration 17: loss 0.8259870936519939, diff >> >> >>> >> >> 0.1006438117513297 >> >> >>> >> >> Iteration 18: loss 0.8259870936519939, diff 0.0 >> >> >>> >> >> Iteration 19: loss 0.8259870936519939, diff 0.0 >> >> >>> >> >> Iteration 20: loss 0.6327447552109576, diff 0.233952934583647 >> >> >>> >> >> Iteration 21: loss 0.6327447552109576, diff 0.0 >> >> >>> >> >> Iteration 22: loss 0.6327447552109576, diff 0.0 >> >> >>> >> >> Iteration 23: loss 0.5534101162436362, diff >> >> >>> >> >> 0.12538154276652747 >> >> >>> >> >> Iteration 24: loss 0.5534101162436362, diff 0.0 >> >> >>> >> >> Iteration 25: loss 0.5534101162436362, diff 0.0 >> >> >>> >> >> Iteration 26: loss 0.40450200866125635, diff >> >> >>> >> >> 0.2690732137675816 >> >> >>> >> >> Iteration 27: loss 0.40450200866125635, diff 0.0 >> >> >>> >> >> Iteration 28: loss 0.40450200866125635, diff 0.0 >> >> >>> >> >> Iteration 29: loss 0.30788249908237314, diff >> >> >>> >> >> 0.23885980452569502 >> >> >>> >> >> >> >> >>> >> >> Sincerely, >> >> >>> >> >> >> >> >>> >> >> DB Tsai >> >> >>> >> >> Machine Learning Engineer >> >> >>> >> >> Alpine Data Labs >> >> >>> >> >> -------------------------------------- >> >> >>> >> >> Web: http://alpinenow.com/ >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> On Wed, Mar 5, 2014 at 2:00 PM, David Hall >> >> >>> >> >> <d...@cs.berkeley.edu> >> >> >>> >> >> wrote: >> >> >>> >> >> > On Wed, Mar 5, 2014 at 1:57 PM, DB Tsai >> >> >>> >> >> > <dbt...@alpinenow.com> >> >> >>> >> >> > wrote: >> >> >>> >> >> > >> >> >>> >> >> >> Hi David, >> >> >>> >> >> >> >> >> >>> >> >> >> On Tue, Mar 4, 2014 at 8:13 PM, dlwh >> >> >>> >> >> >> <david.lw.h...@gmail.com> >> >> >>> >> >> >> wrote: >> >> >>> >> >> >> > I'm happy to help fix any problems. I've verified at >> >> >>> >> >> >> > points >> >> >>> >> >> >> > that >> >> >>> >> >> >> > the >> >> >>> >> >> >> > implementation gives the exact same sequence of iterates >> >> >>> >> >> >> > for a >> >> >>> >> >> >> > few >> >> >>> >> >> >> different >> >> >>> >> >> >> > functions (with a particular line search) as the c port >> >> >>> >> >> >> > of >> >> >>> >> >> >> > lbfgs. >> >> >>> >> >> >> > So >> >> >>> >> >> I'm >> >> >>> >> >> >> a >> >> >>> >> >> >> > little surprised it fails where Fortran succeeds... but >> >> >>> >> >> >> > only a >> >> >>> >> >> >> > little. >> >> >>> >> >> >> This >> >> >>> >> >> >> > was fixed late last year. >> >> >>> >> >> >> I'm working on a reproducible test case using breeze vs >> >> >>> >> >> >> fortran >> >> >>> >> >> >> implementation to show the problem I've run into. The test >> >> >>> >> >> >> will >> >> >>> >> >> >> be >> >> >>> >> >> >> in >> >> >>> >> >> >> one of the test cases in my Spark fork, is it okay for you >> >> >>> >> >> >> to >> >> >>> >> >> >> investigate the issue? Or do I need to make it as a >> >> >>> >> >> >> standalone >> >> >>> >> >> >> test? >> >> >>> >> >> >> >> >> >>> >> >> > >> >> >>> >> >> > >> >> >>> >> >> > Um, as long as it wouldn't be too hard to pull out. >> >> >>> >> >> > >> >> >>> >> >> > >> >> >>> >> >> >> >> >> >>> >> >> >> Will send you the test later today. >> >> >>> >> >> >> >> >> >>> >> >> >> Thanks. >> >> >>> >> >> >> >> >> >>> >> >> >> Sincerely, >> >> >>> >> >> >> >> >> >>> >> >> >> DB Tsai >> >> >>> >> >> >> Machine Learning Engineer >> >> >>> >> >> >> Alpine Data Labs >> >> >>> >> >> >> -------------------------------------- >> >> >>> >> >> >> Web: http://alpinenow.com/ >> >> >>> >> >> >> >> >> >>> >> >> >> >> >>> > >> >> >>> > >> >> >>> > >> >> >> >> >> >> >> >> > >> > >> > > >