[
https://issues.apache.org/jira/browse/DRILL-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Westin updated DRILL-2260:
--------------------------------
Fix Version/s: (was: 0.9.0)
1.0.0
> Add support for partitioning files by certain criteria when doing a CTAS
> ------------------------------------------------------------------------
>
> Key: DRILL-2260
> URL: https://issues.apache.org/jira/browse/DRILL-2260
> Project: Apache Drill
> Issue Type: Improvement
> Components: Execution - Flow
> Reporter: Aman Sinha
> Assignee: Jacques Nadeau
> Fix For: 1.0.0
>
>
> Doing a CTAS where we create a large number of files (thousands) is becoming
> increasingly common. In order to do partition pruning, we need to organize
> the files into subdirectories such that Drill can expose the directory names
> as 'dir0', 'dir1' etc. and perform pruning. Currently, the organization of
> these files into subdirectories is a manual process and can be tedious.
> We need to provide a mechanism to organize these output files into
> subdirectories without manual intervention. We could add a PARTITIONED BY
> <column> extension to the CTAS statement, similar to what Hive does.
> One question is: suppose we partition by the Month column, do we remove that
> column from the output files ? (since the column is represented by the
> subdirectories).
> Since this is a 'feature' that would span multiple components, I haven't
> categorized it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)