[jira] [Commented] (SPARK-16235) "evaluateEachIteration" is returning wrong results when calculated for classification model.

Mahmoud Rawas (JIRA) Mon, 27 Jun 2016 23:59:35 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-16235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15352487#comment-15352487
 ]


Mahmoud Rawas commented on SPARK-16235:
---------------------------------------

On one hand, the API does not deny us for calculating MSE on classification, 
and I gave MSE as an example, but basically the model predict a probability for 
a categorical value to be in one of of the 2 cases (0,1) (true or false), and 
on calculating MSE the will provide an indication on how model predicted values 
are close to the actual values.
And on the other hand when we calculate this error on each iteration we will be 
able to figure out the point where the model starts to over-fit, be over 
training the model then get the minimum error with all iterations.

Also it is good to mention probability as it will be good idea to expose it to 
the user so he can change the cut off value instead of mllib doing this on 
behalf of the user at mid-range (this will be a different discussion, I will 
move it to a new ticket)

> "evaluateEachIteration" is returning wrong results when calculated for 
> classification model.
> --------------------------------------------------------------------------------------------
>
>                 Key: SPARK-16235
>                 URL: https://issues.apache.org/jira/browse/SPARK-16235
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 1.6.1, 1.6.2, 2.0.0
>            Reporter: Mahmoud Rawas
>
> Basically within the mentioned function there is a code to map the actual 
> value which supposed to be in the range of \[0,1] into the range of \[-1,1], 
> in order to make it compatible with the predicted value produces by a 
> classification mode. 
> {code}
> val remappedData = algo match {
>       case Classification => data.map(x => new LabeledPoint((x.label * 2) - 
> 1, x.features))
>       case _ => data
>     }
> {code}
> the problem with this approach is the fact that it will calculate an 
> incorrect error for an example mse will be be 4 time larger than the actual 
> expected mse 
> Instead we should map the predicted value into probability value in [0,1].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-16235) "evaluateEachIteration" is returning wrong results when calculated for classification model.

Reply via email to