[
https://issues.apache.org/jira/browse/SPARK-34448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apache Spark reassigned SPARK-34448:
------------------------------------
Assignee: Apache Spark
> Binary logistic regression incorrectly computes the intercept and
> coefficients when data is not centered
> --------------------------------------------------------------------------------------------------------
>
> Key: SPARK-34448
> URL: https://issues.apache.org/jira/browse/SPARK-34448
> Project: Spark
> Issue Type: Bug
> Components: ML, MLlib
> Affects Versions: 2.4.5, 3.0.0
> Reporter: Yakov Kerzhner
> Assignee: Apache Spark
> Priority: Major
> Labels: correctness
>
> I have written up a fairly detailed gist that includes code to reproduce the
> bug, as well as the output of the code and some commentary:
> [https://gist.github.com/ykerzhner/51358780a6a4cc33266515f17bf98a96]
> To summarize: under certain conditions, the minimization that fits a binary
> logistic regression contains a bug that pulls the intercept value towards the
> log(odds) of the target data. This is mathematically only correct when the
> data comes from distributions with zero means. In general, this gives
> incorrect intercept values, and consequently incorrect coefficients as well.
> As I am not so familiar with the spark code base, I have not been able to
> find this bug within the spark code itself. A hint to this bug is here:
> [https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L894-L904]
> based on the code, I don't believe that the features have zero means at this
> point, and so this heuristic is incorrect. But an incorrect starting point
> does not explain this bug. The minimizer should drift to the correct place.
> I was not able to find the code of the actual objective function that is
> being minimized.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]