[ https://issues.apache.org/jira/browse/SPARK-44812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
copperybean updated SPARK-44812: -------------------------------- Description: For following SQL {code:sql} select a from (select a from tl intersect select x from tr) where a > 123 {code} The physical plan is {code:bash} == Physical Plan == AdaptiveSparkPlan isFinalPlan=false +- HashAggregate(keys=[a#8L], functions=[]) +- Exchange hashpartitioning(a#8L, 200), ENSURE_REQUIREMENTS, [plan_id=133] +- HashAggregate(keys=[a#8L], functions=[]) +- BroadcastHashJoin [coalesce(a#8L, 0), isnull(a#8L)], [coalesce(x#24L, 0), isnull(x#24L)], LeftSemi, BuildRight, false :- Filter (isnotnull(a#8L) AND (a#8L > 123)) : +- FileScan json [a#8L] ... +- BroadcastExchange HashedRelationBroadcastMode(List(coalesce(input[0, bigint, true], 0), isnull(input[0, bigint, true])),false), [plan_id=129] +- FileScan json [x#24L] ... {code} We can find the filter {color:#ff8b00}_a > 123_{color} is not pushed to right table. Further more, for following SQL {code:sql} select a from (select a from tl intersect select x from tr) join trr on a = y{code} The physical plan is {code:bash} *(3) Project [a#8L] +- *(3) BroadcastHashJoin [a#8L], [y#114L], Inner, BuildRight, false :- *(3) HashAggregate(keys=[a#8L], functions=[]) : +- Exchange hashpartitioning(a#8L, 200), ENSURE_REQUIREMENTS, [plan_id=506] : +- *(1) HashAggregate(keys=[a#8L], functions=[]) : +- *(1) BroadcastHashJoin [coalesce(a#8L, 0), isnull(a#8L)], [coalesce(x#24L, 0), isnull(x#24L)], LeftSemi, BuildRight, false : :- *(1) Filter isnotnull(a#8L) : : +- FileScan json [a#8L] ... : +- BroadcastExchange HashedRelationBroadcastMode(List(coalesce(input[0, bigint, true], 0), isnull(input[0, bigint, true])),false), [plan_id=490] : +- FileScan json [x#24L] ... +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]),false), [plan_id=512] +- *(2) Filter isnotnull(y#114L) +- FileScan json [y#114L] ...{code} There should be a filter _{color:#ff8b00}isnotnull(x){color}_ for table tr, while it's not pushed down. was: For following SQL {code:sql} select a from (select a from tl intersect select x from tr) where a > 123 {code} The physical plan is {code:bash} == Physical Plan == AdaptiveSparkPlan isFinalPlan=false +- HashAggregate(keys=[a#8L], functions=[]) +- Exchange hashpartitioning(a#8L, 200), ENSURE_REQUIREMENTS, [plan_id=133] +- HashAggregate(keys=[a#8L], functions=[]) +- BroadcastHashJoin [coalesce(a#8L, 0), isnull(a#8L)], [coalesce(x#24L, 0), isnull(x#24L)], LeftSemi, BuildRight, false :- Filter (isnotnull(a#8L) AND (a#8L > 123)) : +- FileScan json [a#8L] ... +- BroadcastExchange HashedRelationBroadcastMode(List(coalesce(input[0, bigint, true], 0), isnull(input[0, bigint, true])),false), [plan_id=129] +- FileScan json [x#24L] ... {code} We can find the filter {color:#ff8b00}_a > 123_{color} is not pushed to right table. Further more, for following SQL {code:sql} select a from (select a from tl intersect select x from tr) join trr on a = y{code} The physical plan is {code:bash} *(3) Project [a#8L] +- *(3) BroadcastHashJoin [a#8L], [y#114L], Inner, BuildRight, false :- *(3) HashAggregate(keys=[a#8L], functions=[]) : +- Exchange hashpartitioning(a#8L, 200), ENSURE_REQUIREMENTS, [plan_id=506] : +- *(1) HashAggregate(keys=[a#8L], functions=[]) : +- *(1) BroadcastHashJoin [coalesce(a#8L, 0), isnull(a#8L)], [coalesce(x#24L, 0), isnull(x#24L)], LeftSemi, BuildRight, false : :- *(1) Filter isnotnull(a#8L) : : +- FileScan json [a#8L] ... : +- BroadcastExchange HashedRelationBroadcastMode(List(coalesce(input[0, bigint, true], 0), isnull(input[0, bigint, true])),false), [plan_id=490] : +- FileScan json [x#24L] ... +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]),false), [plan_id=512] +- *(2) Filter isnotnull(y#114L) +- FileScan json [y#114L] ...{code} There should be a filter isnotnull(x) for table tr, while it's not pushed down. > Push filters through intersect > ------------------------------ > > Key: SPARK-44812 > URL: https://issues.apache.org/jira/browse/SPARK-44812 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.4.1 > Reporter: copperybean > Priority: Major > > For following SQL > {code:sql} > select a from (select a from tl intersect select x from tr) where a > 123 > {code} > The physical plan is > {code:bash} > == Physical Plan == > AdaptiveSparkPlan isFinalPlan=false > +- HashAggregate(keys=[a#8L], functions=[]) > +- Exchange hashpartitioning(a#8L, 200), ENSURE_REQUIREMENTS, [plan_id=133] > +- HashAggregate(keys=[a#8L], functions=[]) > +- BroadcastHashJoin [coalesce(a#8L, 0), isnull(a#8L)], > [coalesce(x#24L, 0), isnull(x#24L)], LeftSemi, BuildRight, false > :- Filter (isnotnull(a#8L) AND (a#8L > 123)) > : +- FileScan json [a#8L] ... > +- BroadcastExchange > HashedRelationBroadcastMode(List(coalesce(input[0, bigint, true], 0), > isnull(input[0, bigint, true])),false), [plan_id=129] > +- FileScan json [x#24L] ... {code} > We can find the filter {color:#ff8b00}_a > 123_{color} is not pushed to right > table. > > > Further more, for following SQL > {code:sql} > select a from (select a from tl intersect select x from tr) join trr on a = > y{code} > The physical plan is > {code:bash} > *(3) Project [a#8L] > +- *(3) BroadcastHashJoin [a#8L], [y#114L], Inner, BuildRight, false > :- *(3) HashAggregate(keys=[a#8L], functions=[]) > : +- Exchange hashpartitioning(a#8L, 200), ENSURE_REQUIREMENTS, > [plan_id=506] > : +- *(1) HashAggregate(keys=[a#8L], functions=[]) > : +- *(1) BroadcastHashJoin [coalesce(a#8L, 0), isnull(a#8L)], > [coalesce(x#24L, 0), isnull(x#24L)], LeftSemi, BuildRight, false > : :- *(1) Filter isnotnull(a#8L) > : : +- FileScan json [a#8L] ... > : +- BroadcastExchange > HashedRelationBroadcastMode(List(coalesce(input[0, bigint, true], 0), > isnull(input[0, bigint, true])),false), [plan_id=490] > : +- FileScan json [x#24L] ... > +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, > false]),false), [plan_id=512] > +- *(2) Filter isnotnull(y#114L) > +- FileScan json [y#114L] ...{code} > There should be a filter _{color:#ff8b00}isnotnull(x){color}_ for table tr, > while it's not pushed down. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org