Re: Contributing to MLlib on GLM

Gang Bai Mon, 30 Jun 2014 20:18:24 -0700

Thanks Xiaokai,

I’ve created a pull request to merge features in my PR to your repo. Please 
take a review here https://github.com/xwei-datageek/spark/pull/2 .


As for GLMs, here at Sina, we are solving the problem of predicting the num of 
visitors who read a particular news article or watch an online sports live 
stream in a particular period. I’m trying to improve the prediction results by 
tuning features and incorporating new models. So I’ll try Gamma regression 
later. Thanks for the implementation.

Cheers,
-Gang

On Jun 29, 2014, at 8:17 AM, xwei <[email protected]> wrote:

> Hi Gang,
> 
> No worries! 
> 
> I agree LBFGS would converge faster and your test suite is more 
> comprehensive. I'd like to merge my branch with yours.
> 
> I also agree with your viewpoint on the redundancy issue. For different GLMs, 
> usually they only differ in gradient calculation but the ****regression.scala 
> files are quite similar. For example, linearRegressionSGD, 
> logisticRegressionSGD, RidgeRegressionSGD, poissonRegressionSGD all share 
> quite a bit of common code in their class implementations. Since such 
> redundancy is already there in the legacy code, simply merging Poisson and 
> Gamma does not seem to help much. So I suggest we just leave them as separate 
> classes for the time being. 
> 
> 
> Best regards,
> 
> Xiaokai
> 
> On Jun 27, 2014, at 6:45 PM, Gang Bai [via Apache Spark Developers List] 
> wrote:
> 
>> Hi Xiaokai, 
>> 
>> My bad. I didn't notice this before I created another PR for Poisson 
>> regression. The mails were buried in junk by the corp mail master. Also, 
>> thanks for considering my comments and advice in your PR. 
>> 
>> Adding my two cents here: 
>> 
>> * PoissonRegressionModel and GammaRegressionModel have the same fields and 
>> prediction method. Shall we use one instead of two redundant classes? Say, a 
>> LogLinearModel. 
>> * The LBFGS optimizer takes fewer iterations and results in better 
>> convergence than SGD. I implemented two GeneralizedLinearAlgorithm classes 
>> using LBFGS and SGD respectively. You may take a look into it. If it's OK to 
>> you, I'd be happy to send a PR to your branch. 
>> * In addition to the generated test data, We may use some real-world data 
>> for testing. In my implementation, I added the test data from 
>> https://onlinecourses.science.psu.edu/stat504/node/223. Please check my test 
>> suite. 
>> 
>> -Gang 
>> Sent from my iPad 
>> 
>>> On 2014年6月27日, at 下午6:03, "xwei" <[hidden email]> wrote: 
>>> 
>>> 
>>> Yes, that's what we did: adding two gradient functions to Gradient.scala 
>>> and 
>>> create PoissonRegression and GammaRegression using these gradients. We made 
>>> a PR on this. 
>>> 
>>> 
>>> 
>>> -- 
>>> View this message in context: 
>>> http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-on-GLM-tp7033p7088.html
>>> Sent from the Apache Spark Developers List mailing list archive at 
>>> Nabble.com. 
>> 
>> 
>> If you reply to this email, your message will be added to the discussion 
>> below:
>> http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-on-GLM-tp7033p7107.html
>> To unsubscribe from Contributing to MLlib on GLM, click here.
>> NAML
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-on-GLM-tp7033p7117.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: Contributing to MLlib on GLM

Reply via email to