Sean Owen commented on SPARK-17987:

I think this is on purpose. There's no meaningful way to handle a missing value 
in evaluation like this, so any solution would amount to imputing a value, and 
the caller can/should do that. That is, what's the contribution to the error 
metric in this case? 0? infinity? NaN? all of those are problematic.

> ML Evaluator fails to handle null values in the dataset
> -------------------------------------------------------
>                 Key: SPARK-17987
>                 URL: https://issues.apache.org/jira/browse/SPARK-17987
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>    Affects Versions: 1.6.2, 2.0.1
>            Reporter: bo song
> Take the RegressionEvaluator as an example, when the predictionCol is null in 
> a row, en exception "scala.MatchEror" will be thrown. The missing null 
> prediction is a common case, for example when an predictor is missing, or its 
> value is out of bound, almost machine learning models could not produce 
> correct predictions, then null predictions would be returned. Evaluators 
> should handle the null values instead of an exception thrown, the common way 
> to handle missing null values is to ignore them. Besides of the null value, 
> the NAN value need to be handled correctly too. 
> Those three evaluators RegressionEvaluator, BinaryClassificationEvaluator and 
> MulticlassClassificationEvaluator have the same problem.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to