[ 
https://issues.apache.org/jira/browse/DRILL-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Westin updated DRILL-2260:
--------------------------------
    Fix Version/s:     (was: 0.9.0)
                   1.0.0

> Add support for partitioning files by certain criteria when doing a CTAS
> ------------------------------------------------------------------------
>
>                 Key: DRILL-2260
>                 URL: https://issues.apache.org/jira/browse/DRILL-2260
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Flow
>            Reporter: Aman Sinha
>            Assignee: Jacques Nadeau
>             Fix For: 1.0.0
>
>
> Doing a CTAS where we create a large number of files (thousands) is becoming 
> increasingly common.  In order to do partition pruning, we need to organize 
> the files into subdirectories such that Drill can expose the directory names 
> as 'dir0', 'dir1' etc. and perform pruning.  Currently, the organization of 
> these files into subdirectories is a manual process and can be tedious. 
> We need to provide a mechanism to organize these output files into 
> subdirectories without manual intervention.  We could add a PARTITIONED BY 
> <column> extension to the CTAS statement, similar to what Hive does.  
> One question is: suppose we partition by the Month column, do we remove that 
> column from the output files ? (since the column is represented by the 
> subdirectories).  
> Since this is a 'feature' that would span multiple components, I haven't 
> categorized it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to