Lu Yan created SPARK-5614:
-----------------------------

             Summary: Predicate pushdown through Generate
                 Key: SPARK-5614
                 URL: https://issues.apache.org/jira/browse/SPARK-5614
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 1.2.0
            Reporter: Lu Yan


Now in Catalyst's rules, predicates can not be pushed through "Generate" nodes. 
Further more, partition pruning in HiveTableScan can not be applied on those 
queries involves "Generate". This makes such queries very inefficient.
For example, physical plan for query
{quote}
select len, bk
from s_server lateral view explode(len_arr) len_table as len 
where len > 5 and day = '20150102';
{quote}
where 'day' is a partition column in metastore is like this in current version 
of Spark SQL:
{quote}
Project [len, bk]

Filter ((len > "5") && "(event_day = "20150102")")

Generate explode(len_arr), true, false

HiveTableScan [bk, len_arr, day], (MetastoreRelation pblog.audio, 
audio_central_ts_server, None), None
{quote}
But theoretically the plan should be like this
{quote}
Project [len, bk]

Filter (len > "5")

Generate explode(len_arr), true, false

HiveTableScan [bk, len_arr, day], (MetastoreRelation pblog.audio, 
audio_central_ts_server, None), Some(event_day = "20150102")
{quote} 
Where partition pruning predicates can be pushed to HiveTableScan nodes.
I've developed a solution on this issue. If you guys do not have a plan for 
this already, I could merge the solution back to master.
And there is also a problem on column pruning for "Generate", I would file 
another issue about that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to