[ https://issues.apache.org/jira/browse/SPARK-27411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan reassigned SPARK-27411: ----------------------------------- Assignee: Mingcong Han > DataSourceV2Strategy should not eliminate subquery > -------------------------------------------------- > > Key: SPARK-27411 > URL: https://issues.apache.org/jira/browse/SPARK-27411 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.0.0 > Reporter: Mingcong Han > Assignee: Mingcong Han > Priority: Major > Fix For: 3.0.0 > > > In DataSourceV2Strategy, it seems we eliminate the subqueries by mistake > after normalizing filters. Here is an example: > We have an sql with a scalar subquery: > {code:scala} > val plan = spark.sql("select * from t2 where t2a > (select max(t1a) from t1)") > plan.explain(true) > {code} > And we get the log info of DataSourceV2Strategy: > {noformat} > Pushing operators to csv:examples/src/main/resources/t2.txt > Pushed Filters: > Post-Scan Filters: isnotnull(t2a#30) > Output: t2a#30, t2b#31 > {noformat} > The `Post-Scan Filters` should contain the scalar subquery, but we eliminate > it by mistake. > {noformat} > == Parsed Logical Plan == > 'Project [*] > +- 'Filter ('t2a > scalar-subquery#56 []) > : +- 'Project [unresolvedalias('max('t1a), None)] > : +- 'UnresolvedRelation `t1` > +- 'UnresolvedRelation `t2` > == Analyzed Logical Plan == > t2a: string, t2b: string > Project [t2a#30, t2b#31] > +- Filter (t2a#30 > scalar-subquery#56 []) > : +- Aggregate [max(t1a#13) AS max(t1a)#63] > : +- SubqueryAlias `t1` > : +- RelationV2[t1a#13, t1b#14] > csv:examples/src/main/resources/t1.txt > +- SubqueryAlias `t2` > +- RelationV2[t2a#30, t2b#31] csv:examples/src/main/resources/t2.txt > == Optimized Logical Plan == > Filter (isnotnull(t2a#30) && (t2a#30 > scalar-subquery#56 [])) > : +- Aggregate [max(t1a#13) AS max(t1a)#63] > : +- Project [t1a#13] > : +- RelationV2[t1a#13, t1b#14] csv:examples/src/main/resources/t1.txt > +- RelationV2[t2a#30, t2b#31] csv:examples/src/main/resources/t2.txt > == Physical Plan == > *(1) Project [t2a#30, t2b#31] > +- *(1) Filter isnotnull(t2a#30) > +- *(1) BatchScan[t2a#30, t2b#31] class > org.apache.spark.sql.execution.datasources.v2.csv.CSVScan > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org