[jira] [Created] (DRILL-4308) Aggregate operations on dir columns can be more efficient for certain use cases

Aman Sinha (JIRA) Mon, 25 Jan 2016 09:15:02 -0800

Aman Sinha created DRILL-4308:
---------------------------------

             Summary: Aggregate operations on dir<N> columns can be more 
efficient for certain use cases
                 Key: DRILL-4308
                 URL: https://issues.apache.org/jira/browse/DRILL-4308
             Project: Apache Drill
          Issue Type: Improvement
          Components: Execution - Relational Operators
    Affects Versions: 1.4.0
            Reporter: Aman Sinha



For queries that perform plain aggregates or DISTINCT operations on the 
directory partition columns (dir0, dir1 etc.) and there are no other columns 
referenced in the query, the performance could be substantially improved by not 
having to scan the entire dataset.   

Consider the following types of queries:
{noformat}
select  min(dir0) from largetable;
select  distinct dir0 from largetable;
{noformat}

The number of distinct values of dir<N> columns is typically quite small and 
there's no reason to scan the large table.  This is also come as user feedback 
from some Drill users.  Of course, if there's any other column referenced in 
the query (WHERE, ORDER-BY etc.) then we cannot apply this optimization.  





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4308) Aggregate operations on dir columns can be more efficient for certain use cases

Reply via email to