[
https://issues.apache.org/jira/browse/DRILL-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Volodymyr Vysotskyi resolved DRILL-5773.
----------------------------------------
Resolution: Fixed
Fix Version/s: 1.16.0
Looks like it was fixed in the scope of DRILL-6118.
> Project pushdown into a subquery with select *
> ----------------------------------------------
>
> Key: DRILL-5773
> URL: https://issues.apache.org/jira/browse/DRILL-5773
> Project: Apache Drill
> Issue Type: Improvement
> Reporter: Jinfeng Ni
> Assignee: Hanumath Rao Maduri
> Priority: Major
> Fix For: 1.16.0
>
>
> If a subquery / table expression/ view has a `select *` and out query is
> requesting a subset of columns/fields, Drill currently does not do project
> pushdown into the subquery. As a result, the scan operator will return every
> column/field in the table, this would significantly impact query performance,
> especially if # of column/field is large.
> For instance,
> {code}
> SELECT n_regionkey, count(*) AS cnt
> FROM (SELECT * FROM cp.`tpch/nation.parquet`) AS n
> GROUP BY n_regionkey;
> {code}
> Here is the plan
> {code}
> 00-00 Screen
> 00-01 Project(n_regionkey=[$0], cnt=[$1])
> 00-02 Project(n_regionkey=[$0], cnt=[$1])
> 00-03 HashAgg(group=[{0}], cnt=[COUNT()])
> 00-04 Project(n_regionkey=[ITEM($0, 'n_regionkey')])
> 00-05 Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath [path=classpath:/tpch/nation.parquet]],
> selectionRoot=classpath:/tpch/nation.parquet, numFiles=1,
> usedMetadataFile=false, columns=[`*`]]])
> {code}
> Notice that in Scan operator `columns = *`, indicating that it will read
> every column.
> From performance perspective, Drill should push project into subquery with
> select *.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)