[ https://issues.apache.org/jira/browse/SPARK-20359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan updated SPARK-20359: -------------------------------- Fix Version/s: 2.3.0 2.1.1 > Catalyst EliminateOuterJoin optimization can cause NPE > ------------------------------------------------------ > > Key: SPARK-20359 > URL: https://issues.apache.org/jira/browse/SPARK-20359 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.1.0 > Environment: spark master at commit > 35e5ae4f81176af52569c465520a703529893b50 (Sun Apr 16) > Reporter: koert kuipers > Fix For: 2.1.1, 2.2.0, 2.3.0 > > > we were running in to an NPE in one of our UDFs for spark sql. > > now this particular function indeed could not handle nulls, but this was by > design since null input was never allowed (and we would want it to blow up if > there was a null as input). > we realized the issue was not in our data when we added filters for nulls and > the NPE still happened. then we also saw the NPE when just doing > dataframe.explain instead of running our job. > turns out the issue is in EliminateOuterJoin.canFilterOutNull where a row > with all nulls ifs fed into the expression as a test. its the line: > val v = boundE.eval(emptyRow) > i believe it is a bug to assume the expression can always handle nulls. > for example this fails: > {noformat} > val df1 = Seq("a", "b", "c").toDF("x") > .withColumn("y", udf{ (x: String) => x.substring(0, 1) + "!" }.apply($"x")) > val df2 = Seq("a", "b").toDF("x1") > df1 > .join(df2, df1("x") === df2("x1"), "left_outer") > .filter($"x1".isNotNull || !$"y".isin("a!")) > .count > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org