Henry Robinson created SPARK-24254:
--------------------------------------
Summary: Eagerly evaluate some subqueries over LocalRelation
Key: SPARK-24254
URL: https://issues.apache.org/jira/browse/SPARK-24254
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 2.4.0
Reporter: Henry Robinson
Some queries would benefit from evaluating subqueries over {{LocalRelations}}
eagerly. For example:
{code}
SELECT t1.part_col FROM t1 JOIN (SELECT max(part_col) m FROM t2) foo WHERE
t1.part_col = foo.m
{code}
If {{max(part_col)}} could be evaluated during planning, there's an opportunity
to prune all but at most one partitions from the scan of {{t1}}.
Similarly, a near-identical query with a non-scalar subquery in the {{WHERE}}
clause:
{code}
SELECT * FROM t1 WHERE part_col IN (SELECT part_col FROM t2)
{code}
could be partially evaluated to eliminate some partitions, and remove the join
from the plan.
Obviously all subqueries over local relations can't be evaluated during
planning, but certain whitelisted aggregates could be if the input cardinality
isn't too high.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]