[ 
https://issues.apache.org/jira/browse/SPARK-17987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15585082#comment-15585082
 ] 

Sean Owen commented on SPARK-17987:
-----------------------------------

This amounts to assuming that any row that has no prediction should cause an 
error equal to the average of the others, that it's about as wrong as anything 
else on average. But it's a case where no prediction was returned at all; is 
that reasonable? In some cases yes, in a lot of cases no. 

There is a school of thought that all of these types of variants should be 
built into monolithic library objects. I see this style in R, and to a degree 
Python. I think the style of Spark is more compositional, because it's 
relatively easy to implement filtering and transformation as you like outside 
something like a core evaluator, and so it's better to leave it outside. In R 
that isn't true, hence these become flags or options on a big evaluator object.

> ML Evaluator fails to handle null values in the dataset
> -------------------------------------------------------
>
>                 Key: SPARK-17987
>                 URL: https://issues.apache.org/jira/browse/SPARK-17987
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>    Affects Versions: 1.6.2, 2.0.1
>            Reporter: bo song
>
> Take the RegressionEvaluator as an example, when the predictionCol is null in 
> a row, en exception "scala.MatchEror" will be thrown. The missing null 
> prediction is a common case, for example when an predictor is missing, or its 
> value is out of bound, almost machine learning models could not produce 
> correct predictions, then null predictions would be returned. Evaluators 
> should handle the null values instead of an exception thrown, the common way 
> to handle missing null values is to ignore them. Besides of the null value, 
> the NAN value need to be handled correctly too. 
> Those three evaluators RegressionEvaluator, BinaryClassificationEvaluator and 
> MulticlassClassificationEvaluator have the same problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to