[
https://issues.apache.org/jira/browse/DRILL-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15124349#comment-15124349
]
Jason Altekruse commented on DRILL-4308:
----------------------------------------
To match your query a little more closely I changed my current schema and fully
qualified the table name, still same results.
{code}
0: jdbc:drill:zk=local> select dir0 from dfs.mxd.mock_data where dir0 =
maxdir('dfs.mxd','mock_data') limit 1;
+-------+
| dir0 |
+-------+
| 1997 |
+-------+
1 row selected (0.125 seconds)
0: jdbc:drill:zk=local> select dir0 from dfs.mxd.mock_data where dir0 =
mindir('dfs.mxd','mock_data') limit 1;
+-------+
| dir0 |
+-------+
| 1994 |
+-------+
1 row selected (0.116 seconds)
{code}
> Aggregate operations on dir<N> columns can be more efficient for certain use
> cases
> ----------------------------------------------------------------------------------
>
> Key: DRILL-4308
> URL: https://issues.apache.org/jira/browse/DRILL-4308
> Project: Apache Drill
> Issue Type: Improvement
> Components: Execution - Relational Operators
> Affects Versions: 1.4.0
> Reporter: Aman Sinha
>
> For queries that perform plain aggregates or DISTINCT operations on the
> directory partition columns (dir0, dir1 etc.) and there are no other columns
> referenced in the query, the performance could be substantially improved by
> not having to scan the entire dataset.
> Consider the following types of queries:
> {noformat}
> select min(dir0) from largetable;
> select distinct dir0 from largetable;
> {noformat}
> The number of distinct values of dir<N> columns is typically quite small and
> there's no reason to scan the large table. This is also come as user
> feedback from some Drill users. Of course, if there's any other column
> referenced in the query (WHERE, ORDER-BY etc.) then we cannot apply this
> optimization.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)