Jason Altekruse created DRILL-2173:
--------------------------------------
Summary: Enable querying partition information without reading all
data
Key: DRILL-2173
URL: https://issues.apache.org/jira/browse/DRILL-2173
Project: Apache Drill
Issue Type: New Feature
Components: Query Planning & Optimization
Affects Versions: 0.7.0
Reporter: Jason Altekruse
Assignee: Jason Altekruse
When reading a series of files in nested directories, Drill currently adds
columns representing the directory structure that was traversed to reach the
file currently being read. These columns are stored as varchar under tha names
dir0, dir1, ... As these are just regular columns, Drill allows arbitrary
queries against this data, in terms of aggregates, filter, sort, etc. To allow
optimizing reads, basic partition pruning has already been added to prune in
the case of an expression like dir0 = "2015" or a simple in list, which is
converted during planning to a series of ORs of equals expressions. If users
want to query the directory information dynamically, and not include specific
directory names in the query, this will prompt a full table scan and filter
operation on the dir columns. This enhancement is to allow more complex queries
to be run against directory metadata, and only scanning the matching
directories.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)