Thanks Xiaokai, I’ve created a pull request to merge features in my PR to your repo. Please take a review here https://github.com/xwei-datageek/spark/pull/2 .
As for GLMs, here at Sina, we are solving the problem of predicting the num of visitors who read a particular news article or watch an online sports live stream in a particular period. I’m trying to improve the prediction results by tuning features and incorporating new models. So I’ll try Gamma regression later. Thanks for the implementation. Cheers, -Gang On Jun 29, 2014, at 8:17 AM, xwei <weixiao...@gmail.com> wrote: > Hi Gang, > > No worries! > > I agree LBFGS would converge faster and your test suite is more > comprehensive. I'd like to merge my branch with yours. > > I also agree with your viewpoint on the redundancy issue. For different GLMs, > usually they only differ in gradient calculation but the ****regression.scala > files are quite similar. For example, linearRegressionSGD, > logisticRegressionSGD, RidgeRegressionSGD, poissonRegressionSGD all share > quite a bit of common code in their class implementations. Since such > redundancy is already there in the legacy code, simply merging Poisson and > Gamma does not seem to help much. So I suggest we just leave them as separate > classes for the time being. > > > Best regards, > > Xiaokai > > On Jun 27, 2014, at 6:45 PM, Gang Bai [via Apache Spark Developers List] > wrote: > >> Hi Xiaokai, >> >> My bad. I didn't notice this before I created another PR for Poisson >> regression. The mails were buried in junk by the corp mail master. Also, >> thanks for considering my comments and advice in your PR. >> >> Adding my two cents here: >> >> * PoissonRegressionModel and GammaRegressionModel have the same fields and >> prediction method. Shall we use one instead of two redundant classes? Say, a >> LogLinearModel. >> * The LBFGS optimizer takes fewer iterations and results in better >> convergence than SGD. I implemented two GeneralizedLinearAlgorithm classes >> using LBFGS and SGD respectively. You may take a look into it. If it's OK to >> you, I'd be happy to send a PR to your branch. >> * In addition to the generated test data, We may use some real-world data >> for testing. In my implementation, I added the test data from >> https://onlinecourses.science.psu.edu/stat504/node/223. Please check my test >> suite. >> >> -Gang >> Sent from my iPad >> >>> On 2014年6月27日, at 下午6:03, "xwei" <[hidden email]> wrote: >>> >>> >>> Yes, that's what we did: adding two gradient functions to Gradient.scala >>> and >>> create PoissonRegression and GammaRegression using these gradients. We made >>> a PR on this. >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-on-GLM-tp7033p7088.html >>> Sent from the Apache Spark Developers List mailing list archive at >>> Nabble.com. >> >> >> If you reply to this email, your message will be added to the discussion >> below: >> http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-on-GLM-tp7033p7107.html >> To unsubscribe from Contributing to MLlib on GLM, click here. >> NAML > > > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-on-GLM-tp7033p7117.html > Sent from the Apache Spark Developers List mailing list archive at Nabble.com.