[jira] [Assigned] (SPARK-20359) Catalyst EliminateOuterJoin optimization can cause NPE

Apache Spark (JIRA) Mon, 17 Apr 2017 13:01:56 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-20359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Apache Spark reassigned SPARK-20359:
------------------------------------

    Assignee:     (was: Apache Spark)

> Catalyst EliminateOuterJoin optimization can cause NPE
> ------------------------------------------------------
>
>                 Key: SPARK-20359
>                 URL: https://issues.apache.org/jira/browse/SPARK-20359
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.1.0
>         Environment: spark master at commit 
> 35e5ae4f81176af52569c465520a703529893b50 (Sun Apr 16)
>            Reporter: koert kuipers
>             Fix For: 2.2.0
>
>
> we were running in to an NPE in one of our UDFs for spark sql.
>  
> now this particular function indeed could not handle nulls, but this was by 
> design since null input was never allowed (and we would want it to blow up if 
> there was a null as input).
> we realized the issue was not in our data when we added filters for nulls and 
> the NPE still happened. then we also saw the NPE when just doing 
> dataframe.explain instead of running our job.
> turns out the issue is in EliminateOuterJoin.canFilterOutNull where a row 
> with all nulls ifs fed into the expression as a test. its the line:
> val v = boundE.eval(emptyRow)
> i believe it is a bug to assume the expression can always handle nulls.
> for example this fails:
> {noformat}
> val df1 = Seq("a", "b", "c").toDF("x")
>   .withColumn("y", udf{ (x: String) => x.substring(0, 1) + "!" }.apply($"x"))
> val df2 = Seq("a", "b").toDF("x1")
> df1
>   .join(df2, df1("x") === df2("x1"), "left_outer")
>   .filter($"x1".isNotNull || !$"y".isin("a!"))
>   .count
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Assigned] (SPARK-20359) Catalyst EliminateOuterJoin optimization can cause NPE

Reply via email to