Jinfeng Ni created DRILL-5773: --------------------------------- Summary: Project pushdown into a subquery with select * Key: DRILL-5773 URL: https://issues.apache.org/jira/browse/DRILL-5773 Project: Apache Drill Issue Type: Improvement Reporter: Jinfeng Ni
If a subquery / table expression/ view has a `select *` and out query is requesting a subset of columns/fields, Drill currently does not do project pushdown into the subquery. As a result, the scan operator will return every column/field in the table, this would significantly impact query performance, especially if # of column/field is large. For instance, {code} SELECT n_regionkey, count(*) AS cnt FROM (SELECT * FROM cp.`tpch/nation.parquet`) AS n GROUP BY n_regionkey; {code} Here is the plan {code} 00-00 Screen 00-01 Project(n_regionkey=[$0], cnt=[$1]) 00-02 Project(n_regionkey=[$0], cnt=[$1]) 00-03 HashAgg(group=[{0}], cnt=[COUNT()]) 00-04 Project(n_regionkey=[ITEM($0, 'n_regionkey')]) 00-05 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/nation.parquet]], selectionRoot=classpath:/tpch/nation.parquet, numFiles=1, usedMetadataFile=false, columns=[`*`]]]) {code} Notice that in Scan operator `columns = *`, indicating that it will read every column. >From performance perspective, Drill should push project into subquery with >select *. -- This message was sent by Atlassian JIRA (v6.4.14#64029)