[
https://issues.apache.org/jira/browse/SPARK-9837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994364#comment-14994364
]
Soila Kavulya commented on SPARK-9837:
--------------------------------------
[~mengxr] It is open-sourced. We compute the gradient for Hessian matrix using
the cost function for LBFGS or SGD in logistic regression. We could add the
cost function for IRLS. The implementation assumes that the Hessian matrix fits
in the memory of a single machine.
We have contributed the code for computing the empirical Hessian to Scala
Breeze so it should be in their next release
https://github.com/scalanlp/breeze/blob/f4c326f9f219859156a8c55bc667369813fa4b52/math/src/main/scala/breeze/optimize/SecondOrderFunction.scala
We compute the empirical Hessian for logistic regression using the weights from
the final iteration.
https://github.com/trustedanalytics/atk/blob/ff5abbc1a8b9ac1544568e79c0b980bdb6cc2908/engine-plugins/model-plugins/src/main/scala/org/apache/spark/mllib/evaluation/ApproximateHessianMatrix.scala
Then we compute the summary statistics
https://github.com/trustedanalytics/atk/blob/b8d5d6d01d9680d93e0a10ae769fb306081192c5/engine-plugins/model-plugins/src/main/scala/org/trustedanalytics/atk/engine/model/plugins/classification/glm/LogisticRegressionSummaryTable.scala
> Provide R-like summary statistics for GLMs via iteratively reweighted least
> squares
> -----------------------------------------------------------------------------------
>
> Key: SPARK-9837
> URL: https://issues.apache.org/jira/browse/SPARK-9837
> Project: Spark
> Issue Type: New Feature
> Components: ML, MLlib
> Reporter: Xiangrui Meng
>
> This is similar to SPARK-9836 but for GLMs other than ordinary least squares.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]