[ https://issues.apache.org/jira/browse/DRILL-684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jinfeng Ni updated DRILL-684: ----------------------------- Attachment: DRILL-684.1.patch In addition to the code change for row count, the patch contains bug fixes: 1) set the type's nullable property for extract function, 'any' type in view DDL or table column list. 2) fix bug in logical/physical Project rule : set up the traits properly. > Use parquet row count in cost-based optimization. Use parquet row count, > column value count to optimize count() aggregate function. > ------------------------------------------------------------------------------------------------------------------------------------- > > Key: DRILL-684 > URL: https://issues.apache.org/jira/browse/DRILL-684 > Project: Apache Drill > Issue Type: Improvement > Reporter: Jinfeng Ni > Assignee: Jinfeng Ni > Attachments: DRILL-684.1.patch > > > Parquet group scan provides the exact row count and the exact value count for > each individual column. Such information could be leveraged in the following > two ways: > 1. Use the count in the cost estimation, when query refers parquet files. > 2. Use the row count or column value count to optimize count() aggregate > function. > For instance, select count(*) from parquet_file; > select count(column_a) from parquet_file; > First query could be transformed to return the row count directly, the second > one could return the column value count for 'column_a'. Both of the two cases > will avoid scan the whole parquet files, thus improve query performance. > -- This message was sent by Atlassian JIRA (v6.2#6252)