Aman Sinha created DRILL-4308:
---------------------------------
Summary: Aggregate operations on dir<N> columns can be more
efficient for certain use cases
Key: DRILL-4308
URL: https://issues.apache.org/jira/browse/DRILL-4308
Project: Apache Drill
Issue Type: Improvement
Components: Execution - Relational Operators
Affects Versions: 1.4.0
Reporter: Aman Sinha
For queries that perform plain aggregates or DISTINCT operations on the
directory partition columns (dir0, dir1 etc.) and there are no other columns
referenced in the query, the performance could be substantially improved by not
having to scan the entire dataset.
Consider the following types of queries:
{noformat}
select min(dir0) from largetable;
select distinct dir0 from largetable;
{noformat}
The number of distinct values of dir<N> columns is typically quite small and
there's no reason to scan the large table. This is also come as user feedback
from some Drill users. Of course, if there's any other column referenced in
the query (WHERE, ORDER-BY etc.) then we cannot apply this optimization.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)