Aman Sinha created DRILL-2260:
---------------------------------
Summary: Add support for partitioning files by certain criteria
when doing a CTAS
Key: DRILL-2260
URL: https://issues.apache.org/jira/browse/DRILL-2260
Project: Apache Drill
Issue Type: Improvement
Reporter: Aman Sinha
Assignee: Jacques Nadeau
Doing a CTAS where we create a large number of files (thousands) is becoming
increasingly common. In order to do partition pruning, we need to organize the
files into subdirectories such that Drill can expose the directory names as
'dir0', 'dir1' etc. and perform pruning. Currently, the organization of these
files into subdirectories is a manual process and can be tedious.
We need to provide a mechanism to organize these output files into
subdirectories without manual intervention. We could add a PARTITIONED BY
<column> extension to the CTAS statement, similar to what Hive does.
One question is: suppose we partition by the Month column, do we remove that
column from the output files ? (since the column is represented by the
subdirectories).
Since this is a 'feature' that would span multiple components, I haven't
categorized it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)