Juliusz Sompolski created SPARK-23087:
-----------------------------------------
Summary: CheckCartesianProduct too restrictive when condition is
constant folded to false/null
Key: SPARK-23087
URL: https://issues.apache.org/jira/browse/SPARK-23087
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 2.2.1, 2.3.0
Reporter: Juliusz Sompolski
Running
{code}
sql("SELECT id as a FROM RANGE(10)").createOrReplaceTempView("A")
sql("SELECT NULL as a FROM RANGE(10)").createOrReplaceTempView("NULLTAB")
sql("SELECT 1 as goo FROM A LEFT OUTER JOIN NULLTAB ON A.a =
NULLTAB.a").collect()
{code}
results in:
{code}
org.apache.spark.sql.AnalysisException: Detected cartesian product for LEFT
OUTER join between logical plans
Project
+- Range (0, 10, step=1, splits=None)
and
Project
+- Range (0, 10, step=1, splits=None)
Join condition is missing or trivial.
Use the CROSS JOIN syntax to allow cartesian products between these relations.;
at
org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$21.applyOrElse(Optimizer.scala:1121)
{code}
This is because NULLTAB.a is constant folded to null, and then the condition is
constant folded altogether:
{code}
=== Applying Rule org.apache.spark.sql.catalyst.optimizer.NullPropagation ===
GlobalLimit 21
+- LocalLimit 21
+- Project [1 AS goo#28]
! +- Join LeftOuter, (a#0L = null)
:- Project [id#1L AS a#0L]
: +- Range (0, 10, step=1, splits=None)
+- Project
+- Range (0, 10, step=1, splits=None)
GlobalLimit 21
+- LocalLimit 21
+- Project [1 AS goo#28]
+- Join LeftOuter, null
:- Project [id#1L AS a#0L]
: +- Range (0, 10, step=1, splits=None)
+- Project
+- Range (0, 10, step=1, splits=None)
{code}
And then CheckCartesianProduct doesn't like it, even though the condition does
not produce a cartesian product, but evaluates to null.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]