[ 
https://issues.apache.org/jira/browse/SPARK-44812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

copperybean updated SPARK-44812:
--------------------------------
    Description: 
For following SQL
{code:sql}
select a from (select a from tl intersect select x from tr) where a > 123 {code}
The physical plan is
{code:bash}
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- HashAggregate(keys=[a#8L], functions=[])
   +- Exchange hashpartitioning(a#8L, 200), ENSURE_REQUIREMENTS, [plan_id=133]
      +- HashAggregate(keys=[a#8L], functions=[])
         +- BroadcastHashJoin [coalesce(a#8L, 0), isnull(a#8L)], 
[coalesce(x#24L, 0), isnull(x#24L)], LeftSemi, BuildRight, false
            :- Filter (isnotnull(a#8L) AND (a#8L > 123))
            :  +- FileScan json [a#8L] ...
            +- BroadcastExchange 
HashedRelationBroadcastMode(List(coalesce(input[0, bigint, true], 0), 
isnull(input[0, bigint, true])),false), [plan_id=129]
               +- FileScan json [x#24L] ... {code}
We can find the filter {color:#ff8b00}_a > 123_{color} is not pushed to right 
table.

 

 

Further more, for following SQL
{code:sql}
select a from (select a from tl intersect select x from tr) join trr on a = 
y{code}
The physical plan is
{code:bash}
*(3) Project [a#8L]
+- *(3) BroadcastHashJoin [a#8L], [y#114L], Inner, BuildRight, false
   :- *(3) HashAggregate(keys=[a#8L], functions=[])
   :  +- Exchange hashpartitioning(a#8L, 200), ENSURE_REQUIREMENTS, 
[plan_id=506]
   :     +- *(1) HashAggregate(keys=[a#8L], functions=[])
   :        +- *(1) BroadcastHashJoin [coalesce(a#8L, 0), isnull(a#8L)], 
[coalesce(x#24L, 0), isnull(x#24L)], LeftSemi, BuildRight, false
   :           :- *(1) Filter isnotnull(a#8L)
   :           :  +- FileScan json [a#8L] ...
   :           +- BroadcastExchange 
HashedRelationBroadcastMode(List(coalesce(input[0, bigint, true], 0), 
isnull(input[0, bigint, true])),false), [plan_id=490]
   :              +- FileScan json [x#24L] ...
   +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, 
false]),false), [plan_id=512]
      +- *(2) Filter isnotnull(y#114L)
         +- FileScan json [y#114L] ...{code}
There should be a filter _{color:#ff8b00}isnotnull(x){color}_ for table tr, 
while it's not pushed down.

  was:
For following SQL
{code:sql}
select a from (select a from tl intersect select x from tr) where a > 123 {code}
The physical plan is
{code:bash}
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- HashAggregate(keys=[a#8L], functions=[])
   +- Exchange hashpartitioning(a#8L, 200), ENSURE_REQUIREMENTS, [plan_id=133]
      +- HashAggregate(keys=[a#8L], functions=[])
         +- BroadcastHashJoin [coalesce(a#8L, 0), isnull(a#8L)], 
[coalesce(x#24L, 0), isnull(x#24L)], LeftSemi, BuildRight, false
            :- Filter (isnotnull(a#8L) AND (a#8L > 123))
            :  +- FileScan json [a#8L] ...
            +- BroadcastExchange 
HashedRelationBroadcastMode(List(coalesce(input[0, bigint, true], 0), 
isnull(input[0, bigint, true])),false), [plan_id=129]
               +- FileScan json [x#24L] ... {code}
We can find the filter {color:#ff8b00}_a > 123_{color} is not pushed to right 
table.

 

 

Further more, for following SQL
{code:sql}
select a from (select a from tl intersect select x from tr) join trr on a = 
y{code}
The physical plan is
{code:bash}
*(3) Project [a#8L]
+- *(3) BroadcastHashJoin [a#8L], [y#114L], Inner, BuildRight, false
   :- *(3) HashAggregate(keys=[a#8L], functions=[])
   :  +- Exchange hashpartitioning(a#8L, 200), ENSURE_REQUIREMENTS, 
[plan_id=506]
   :     +- *(1) HashAggregate(keys=[a#8L], functions=[])
   :        +- *(1) BroadcastHashJoin [coalesce(a#8L, 0), isnull(a#8L)], 
[coalesce(x#24L, 0), isnull(x#24L)], LeftSemi, BuildRight, false
   :           :- *(1) Filter isnotnull(a#8L)
   :           :  +- FileScan json [a#8L] ...
   :           +- BroadcastExchange 
HashedRelationBroadcastMode(List(coalesce(input[0, bigint, true], 0), 
isnull(input[0, bigint, true])),false), [plan_id=490]
   :              +- FileScan json [x#24L] ...
   +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, 
false]),false), [plan_id=512]
      +- *(2) Filter isnotnull(y#114L)
         +- FileScan json [y#114L] ...{code}
There should be a filter isnotnull(x) for table tr, while it's not pushed down.


> Push filters through intersect
> ------------------------------
>
>                 Key: SPARK-44812
>                 URL: https://issues.apache.org/jira/browse/SPARK-44812
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.4.1
>            Reporter: copperybean
>            Priority: Major
>
> For following SQL
> {code:sql}
> select a from (select a from tl intersect select x from tr) where a > 123 
> {code}
> The physical plan is
> {code:bash}
> == Physical Plan ==
> AdaptiveSparkPlan isFinalPlan=false
> +- HashAggregate(keys=[a#8L], functions=[])
>    +- Exchange hashpartitioning(a#8L, 200), ENSURE_REQUIREMENTS, [plan_id=133]
>       +- HashAggregate(keys=[a#8L], functions=[])
>          +- BroadcastHashJoin [coalesce(a#8L, 0), isnull(a#8L)], 
> [coalesce(x#24L, 0), isnull(x#24L)], LeftSemi, BuildRight, false
>             :- Filter (isnotnull(a#8L) AND (a#8L > 123))
>             :  +- FileScan json [a#8L] ...
>             +- BroadcastExchange 
> HashedRelationBroadcastMode(List(coalesce(input[0, bigint, true], 0), 
> isnull(input[0, bigint, true])),false), [plan_id=129]
>                +- FileScan json [x#24L] ... {code}
> We can find the filter {color:#ff8b00}_a > 123_{color} is not pushed to right 
> table.
>  
>  
> Further more, for following SQL
> {code:sql}
> select a from (select a from tl intersect select x from tr) join trr on a = 
> y{code}
> The physical plan is
> {code:bash}
> *(3) Project [a#8L]
> +- *(3) BroadcastHashJoin [a#8L], [y#114L], Inner, BuildRight, false
>    :- *(3) HashAggregate(keys=[a#8L], functions=[])
>    :  +- Exchange hashpartitioning(a#8L, 200), ENSURE_REQUIREMENTS, 
> [plan_id=506]
>    :     +- *(1) HashAggregate(keys=[a#8L], functions=[])
>    :        +- *(1) BroadcastHashJoin [coalesce(a#8L, 0), isnull(a#8L)], 
> [coalesce(x#24L, 0), isnull(x#24L)], LeftSemi, BuildRight, false
>    :           :- *(1) Filter isnotnull(a#8L)
>    :           :  +- FileScan json [a#8L] ...
>    :           +- BroadcastExchange 
> HashedRelationBroadcastMode(List(coalesce(input[0, bigint, true], 0), 
> isnull(input[0, bigint, true])),false), [plan_id=490]
>    :              +- FileScan json [x#24L] ...
>    +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, 
> false]),false), [plan_id=512]
>       +- *(2) Filter isnotnull(y#114L)
>          +- FileScan json [y#114L] ...{code}
> There should be a filter _{color:#ff8b00}isnotnull(x){color}_ for table tr, 
> while it's not pushed down.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to