Shawn Lavelle created SPARK-19730:
-------------------------------------
Summary: Predicate Subqueries do not push results of subqueries to
data source
Key: SPARK-19730
URL: https://issues.apache.org/jira/browse/SPARK-19730
Project: Spark
Issue Type: Bug
Components: Optimizer, SQL
Affects Versions: 2.1.0
Reporter: Shawn Lavelle
When a SparkSQL query contains a subquery in the where clause, such as a
predicate query using the IN operator, the results of that subquery are not
pushed down as a fileter to the DataSourceAPI for the outer query.
Example:
Select point, time, value from data where time between now()-86400 and now()
and point in (select point from groups where group_id=5);
Two queries will be sent to the data Source. One for the subquery, and another
for the outer query. The subquery works correctly returning the points in the
group, however, outer query does not push a filter for point column.
Affect:
The "group" table has a few hundred rows to group a few hundred thousand
points. The data table has several billion rows keyed by point and time.
Without the ability to push down the filters for the columns of outer the
query, the data source cannot properly conduct its pruned scan.
The subquery results should be pushed down to the outer query as an IN Filter
with the results of the subquery.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]