[
https://issues.apache.org/jira/browse/SPARK-25212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800201#comment-16800201
]
Jose Sanchez commented on SPARK-25212:
--------------------------------------
The issue here is that when catalyst runs the *Operator Optimization before
Inferring Filters* batch, it removes the conditions from the Join Inner, so the
*Check Cartesian Products* batch detects a cartesian product:
{code}
Unable to find source-code formatter for language: shell. Available languages
are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go,
groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl,
php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml,
yaml19/03/18 06:23:40 TRACE BaseSessionStateBuilder$$anon$2:
=== Applying Rule org.apache.spark.sql.catalyst.optimizer.FoldablePropagation
===
!Join Inner, (value1#3 = value3#10) Join Inner, (value1#3 = 1)
:- Project [value#1 AS value1#3] :- Project [value#1 AS
value1#3]
: +- LocalRelation [value#1] : +- LocalRelation [value#1]
+- Project [value#6 AS value2#8, 1 AS value3#10] +- Project [value#6 AS
value2#8, 1 AS value3#10]
+- LocalRelation [value#6] +- LocalRelation [value#6]
19/03/18 06:23:40 TRACE BaseSessionStateBuilder$$anon$2:
=== Applying Rule
org.apache.spark.sql.catalyst.optimizer.PushPredicateThroughJoin ===
!Join Inner, (value1#3 = 1) Join Inner
!:- Project [value#1 AS value1#3] :- Filter (value1#3 = 1)
!: +- LocalRelation [value#1] : +- Project [value#1 AS
value1#3]
!+- Project [value#6 AS value2#8, 1 AS value3#10] : +- LocalRelation [value#1]
! +- LocalRelation [value#6] +- Project [value#6 AS
value2#8, 1 AS value3#10]
! +- LocalRelation
[value#6]
...
=== Result of Batch Operator Optimization before Inferring Filters ===
!Join Inner, (value1#3 = value3#10) Join Inner
:- Project [value#1 AS value1#3] :- Project [value#1 AS value1#3]
!: +- LocalRelation [value#1] : +- Filter (value#1 = 1)
!+- Project [value2#8, 1 AS value3#10] : +- LocalRelation [value#1]
! +- Project [value#6 AS value2#8] +- Project [value#6 AS value2#8, 1 AS
value3#10]
! +- LocalRelation [value#6] +- LocalRelation [value#6]
{code}
It makes sense because the join condition has a literal, which is propagated as
a Filter. I had a similar use case where a literal ends up as a *JOIN*
condition, I guess this does not happen very frequently, but it has been fixed
in Spark 2.4.0 by accident with SPARK-25212, thanks to the Batch *LocalRelation
early* which collapses the literal into the *LocalRelation*, so none of the
rules above are triggered.
I wonder, is it worth to backport SPARK-25212 to Spark 2.3 to "fix" this issue?
as I said, looks like this is not a common use case, but Spark 2.4.0 has a
different outcome for these situations.
> Support Filter in ConvertToLocalRelation
> ----------------------------------------
>
> Key: SPARK-25212
> URL: https://issues.apache.org/jira/browse/SPARK-25212
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 2.3.1
> Reporter: Bogdan Raducanu
> Assignee: Bogdan Raducanu
> Priority: Major
> Fix For: 2.4.0
>
>
> ConvertToLocalRelation can make short queries faster but currently it only
> supports Project and Limit.
> It can be extended with other operators such as Filter.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]