[
https://issues.apache.org/jira/browse/SPARK-13497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen updated SPARK-13497:
------------------------------
Priority: Minor (was: Major)
I understand your point, but I think it comes from trying to hack in the idea
of a threshold. softmax and logit would agree for the YES category here. Right
now, the NO category probability is constructed to always equal the threshold.
So if threshold = 0.9, P(no) = 0.9 and P(yes) is what it is; the prediction is
"yes" only when it's higher than 0.9 as desired.
This of course means the probabilities don't sum to 1, and as you point out,
it's not the recommended way of doing it. I don't see any other way to encode
the threshold in the model (?); if there is, then that's what we should also
use.
Changing this unfortunately means the threshold is no longer used in the
prediction which also seems like a problem.
> PMML export for logistic regression does not conform to the PMML standard
> -------------------------------------------------------------------------
>
> Key: SPARK-13497
> URL: https://issues.apache.org/jira/browse/SPARK-13497
> Project: Spark
> Issue Type: Bug
> Components: MLlib
> Affects Versions: 1.6.0
> Reporter: Chris Papadopoulos
> Priority: Minor
>
> In line 52 of
> spark/mllib/src/main/scala/org/apache/spark/mllib/pmml/export/PMMLModelExportFactory.scala
> the binary classification for n=2 is exported with
> RegressionNormalizationMethodType.LOGIT
> But, the PMML standard specifies that it should be a softmax for linear
> regression:
> http://dmg.org/pmml/v4-2-1/Regression.html
> Quote:
> " Note that Binary logistic regression is a special case with
> y = intercept + Sumi (coefficienti * independent variablei )
> p = 1/(1+exp(-y))
> It should be implemented as a classification model
> <RegressionModel functionName="classification" normalizationMethod="softmax"
> ...
> <RegressionTable targetCategory="YES" ...
> <RegressionTable targetCategory="NO" intercept="0.0"
> "
> Evaluating with the logit option leads to unexpected behavior.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]