[ https://issues.apache.org/jira/browse/SPARK-24254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun updated SPARK-24254: ---------------------------------- Affects Version/s: (was: 2.4.0) 3.0.0 > Eagerly evaluate some subqueries over LocalRelation > --------------------------------------------------- > > Key: SPARK-24254 > URL: https://issues.apache.org/jira/browse/SPARK-24254 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.0.0 > Reporter: Henry Robinson > Priority: Major > > Some queries would benefit from evaluating subqueries over {{LocalRelations}} > eagerly. For example: > {code} > SELECT t1.part_col FROM t1 JOIN (SELECT max(part_col) m FROM t2) foo WHERE > t1.part_col = foo.m > {code} > If {{max(part_col)}} could be evaluated during planning, there's an > opportunity to prune all but at most one partitions from the scan of {{t1}}. > Similarly, a near-identical query with a non-scalar subquery in the {{WHERE}} > clause: > {code} > SELECT * FROM t1 WHERE part_col IN (SELECT part_col FROM t2) > {code} > could be partially evaluated to eliminate some partitions, and remove the > join from the plan. > Obviously all subqueries over local relations can't be evaluated during > planning, but certain whitelisted aggregates could be if the input > cardinality isn't too high. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org