Lu Yan created SPARK-5614:
-----------------------------
Summary: Predicate pushdown through Generate
Key: SPARK-5614
URL: https://issues.apache.org/jira/browse/SPARK-5614
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 1.2.0
Reporter: Lu Yan
Now in Catalyst's rules, predicates can not be pushed through "Generate" nodes.
Further more, partition pruning in HiveTableScan can not be applied on those
queries involves "Generate". This makes such queries very inefficient.
For example, physical plan for query
{quote}
select len, bk
from s_server lateral view explode(len_arr) len_table as len
where len > 5 and day = '20150102';
{quote}
where 'day' is a partition column in metastore is like this in current version
of Spark SQL:
{quote}
Project [len, bk]
Filter ((len > "5") && "(event_day = "20150102")")
Generate explode(len_arr), true, false
HiveTableScan [bk, len_arr, day], (MetastoreRelation pblog.audio,
audio_central_ts_server, None), None
{quote}
But theoretically the plan should be like this
{quote}
Project [len, bk]
Filter (len > "5")
Generate explode(len_arr), true, false
HiveTableScan [bk, len_arr, day], (MetastoreRelation pblog.audio,
audio_central_ts_server, None), Some(event_day = "20150102")
{quote}
Where partition pruning predicates can be pushed to HiveTableScan nodes.
I've developed a solution on this issue. If you guys do not have a plan for
this already, I could merge the solution back to master.
And there is also a problem on column pruning for "Generate", I would file
another issue about that.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]