Hi Gang, No worries!
I agree LBFGS would converge faster and your test suite is more comprehensive. I'd like to merge my branch with yours. I also agree with your viewpoint on the redundancy issue. For different GLMs, usually they only differ in gradient calculation but the ****regression.scala files are quite similar. For example, linearRegressionSGD, logisticRegressionSGD, RidgeRegressionSGD, poissonRegressionSGD all share quite a bit of common code in their class implementations. Since such redundancy is already there in the legacy code, simply merging Poisson and Gamma does not seem to help much. So I suggest we just leave them as separate classes for the time being. Best regards, Xiaokai On Jun 27, 2014, at 6:45 PM, Gang Bai [via Apache Spark Developers List] wrote: > Hi Xiaokai, > > My bad. I didn't notice this before I created another PR for Poisson > regression. The mails were buried in junk by the corp mail master. Also, > thanks for considering my comments and advice in your PR. > > Adding my two cents here: > > * PoissonRegressionModel and GammaRegressionModel have the same fields and > prediction method. Shall we use one instead of two redundant classes? Say, a > LogLinearModel. > * The LBFGS optimizer takes fewer iterations and results in better > convergence than SGD. I implemented two GeneralizedLinearAlgorithm classes > using LBFGS and SGD respectively. You may take a look into it. If it's OK to > you, I'd be happy to send a PR to your branch. > * In addition to the generated test data, We may use some real-world data for > testing. In my implementation, I added the test data from > https://onlinecourses.science.psu.edu/stat504/node/223. Please check my test > suite. > > -Gang > Sent from my iPad > > > On 2014年6月27日, at 下午6:03, "xwei" <[hidden email]> wrote: > > > > > > Yes, that's what we did: adding two gradient functions to Gradient.scala > > and > > create PoissonRegression and GammaRegression using these gradients. We made > > a PR on this. > > > > > > > > -- > > View this message in context: > > http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-on-GLM-tp7033p7088.html > > Sent from the Apache Spark Developers List mailing list archive at > > Nabble.com. > > > If you reply to this email, your message will be added to the discussion > below: > http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-on-GLM-tp7033p7107.html > To unsubscribe from Contributing to MLlib on GLM, click here. > NAML -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-on-GLM-tp7033p7117.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.