[
https://issues.apache.org/jira/browse/SPARK-16235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15352487#comment-15352487
]
Mahmoud Rawas commented on SPARK-16235:
---------------------------------------
On one hand, the API does not deny us for calculating MSE on classification,
and I gave MSE as an example, but basically the model predict a probability for
a categorical value to be in one of of the 2 cases (0,1) (true or false), and
on calculating MSE the will provide an indication on how model predicted values
are close to the actual values.
And on the other hand when we calculate this error on each iteration we will be
able to figure out the point where the model starts to over-fit, be over
training the model then get the minimum error with all iterations.
Also it is good to mention probability as it will be good idea to expose it to
the user so he can change the cut off value instead of mllib doing this on
behalf of the user at mid-range (this will be a different discussion, I will
move it to a new ticket)
> "evaluateEachIteration" is returning wrong results when calculated for
> classification model.
> --------------------------------------------------------------------------------------------
>
> Key: SPARK-16235
> URL: https://issues.apache.org/jira/browse/SPARK-16235
> Project: Spark
> Issue Type: Bug
> Affects Versions: 1.6.1, 1.6.2, 2.0.0
> Reporter: Mahmoud Rawas
>
> Basically within the mentioned function there is a code to map the actual
> value which supposed to be in the range of \[0,1] into the range of \[-1,1],
> in order to make it compatible with the predicted value produces by a
> classification mode.
> {code}
> val remappedData = algo match {
> case Classification => data.map(x => new LabeledPoint((x.label * 2) -
> 1, x.features))
> case _ => data
> }
> {code}
> the problem with this approach is the fact that it will calculate an
> incorrect error for an example mse will be be 4 time larger than the actual
> expected mse
> Instead we should map the predicted value into probability value in [0,1].
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]