[
https://issues.apache.org/jira/browse/SPARK-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gang Bai closed SPARK-2303.
---------------------------
Resolution: Fixed
> Poisson regression model for count data
> ---------------------------------------
>
> Key: SPARK-2303
> URL: https://issues.apache.org/jira/browse/SPARK-2303
> Project: Spark
> Issue Type: New Feature
> Components: MLlib
> Reporter: Gang Bai
>
> Modeling count data is of great importance in solving real-world statistic
> problems. Currently mllib.regression provides models mostly for numeric data,
> i.e fitting curves with various regularization on resulted weights, but still
> lacks the support of count data modeling.
> A very basic model for this is the Poisson regression. Following the patterns
> in mllib and reusing the components, we address the parameter estimation for
> Poisson regression in a maximum likelihood manner. In detail, to add Poisson
> regression to mllib.regression, we need to:
> # Add the gradient of the negative log-likelihood into
> mllib/optimization/Gradients.scala.
> # Add the implementations of PoissonRegressionModel, which extends
> GeneralizedLinearModel with RegressionModel. Here we need the implementation
> of the predict method.
> # Add the implementations of the generalized linear algorithm class. Here we
> can use either LBFGS or GradientDescent as the optimizer. So we implement
> both as class PoissonRegressionWithSGD and class PoissonRegressionWithLBFGS
> respectively.
> # Add the companion object PoissonRegressionWithSGD and
> PoissonRegressionWithLBFGS as drivers.
> # Test suites
> ## Test the gradient computation.
> ## Test the regression method using generated data, which requires a
> PoissonRegressionDataGenerator.
> ## Test the regression method using a real-world data set.
> # Add the documents.
--
This message was sent by Atlassian JIRA
(v6.2#6252)