[ https://issues.apache.org/jira/browse/DRILL-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zelaine Fong reassigned DRILL-4308: ----------------------------------- Assignee: Jinfeng Ni [~jni] - I believe the changes you're currently working on as part of DRILL-4387 will address this. Right? > Aggregate operations on dir<N> columns can be more efficient for certain use > cases > ---------------------------------------------------------------------------------- > > Key: DRILL-4308 > URL: https://issues.apache.org/jira/browse/DRILL-4308 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Relational Operators > Affects Versions: 1.4.0 > Reporter: Aman Sinha > Assignee: Jinfeng Ni > > For queries that perform plain aggregates or DISTINCT operations on the > directory partition columns (dir0, dir1 etc.) and there are no other columns > referenced in the query, the performance could be substantially improved by > not having to scan the entire dataset. > Consider the following types of queries: > {noformat} > select min(dir0) from largetable; > select distinct dir0 from largetable; > {noformat} > The number of distinct values of dir<N> columns is typically quite small and > there's no reason to scan the large table. This is also come as user > feedback from some Drill users. Of course, if there's any other column > referenced in the query (WHERE, ORDER-BY etc.) then we cannot apply this > optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)