[ 
https://issues.apache.org/jira/browse/DRILL-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595303#comment-14595303
 ] 

Jacques Nadeau commented on DRILL-3246:
---------------------------------------

>> Since 'partition by' is more closely associated with a DDL statement, it 
>> would seem natural to have it after the CREATE TABLE

It isn't immediately clear as to why you think it would more closely associated 
with DDL?  

>> Could we not require the CTAS to always specify the columns in CREATE TABLE

It seems like we shouldn't do this unless necessary (e.g. there is a dangerous 
ambiguity as in the schemaless union case)

>> we should think about potential implications when extending the SELECT syntax

Always important.  I don't see how an ambiguity would occur given select 
statement semantics but we should always be on the lookout.

>> [Redshift puts it before the SELECT statement]

I'm all for being as consistent as possible with other systems.  Do we feel 
like that is the closest thing to a standard?

> Query planning support for partition by clause in Drill's CTAS statement
> ------------------------------------------------------------------------
>
>                 Key: DRILL-3246
>                 URL: https://issues.apache.org/jira/browse/DRILL-3246
>             Project: Apache Drill
>          Issue Type: New Feature
>          Components: Query Planning & Optimization
>    Affects Versions: 1.0.0
>            Reporter: Jinfeng Ni
>            Assignee: Jinfeng Ni
>             Fix For: 1.1.0
>
>
> We are going to add "PARTITION BY" clause in Drill's CTAS statement. The 
> "PARTITION BY" clause will specify the list of columns out of the result 
> table's column list that will be used to partition the data.  
> CREATE TABLE  table_name  [ (col_name, .... ) ]
> [PARTITION BY (col_name, ...)]
> AS SELECT_STATEMENT;
> Semantics restriction for the PARTITION BY clause:
>  -  All the columns in the PARTITION BY clause have to be in the table's 
> column list, or the SELECT_STATEMENT has a * column, when the base table in 
> the SELECT_STATEMENT is schema-less.  Otherwise, an query validation error 
> would be raised.
>  - When the partition column is resolved to * column in a schema-less query, 
> this * column could not be a result of join operation. This restriction is 
> added, since for * out of join operation, query planner would not know which 
> table might produce this partition column. 
> Example :
> {code}
> create table mytable1  partition by (r_regionkey) as 
>   select r_regionkey, r_name from cp.`tpch/region.parquet`
> {code}
> {code}
> create table mytable2  partition by (r_regionkey) as 
>   select * from cp.`tpch/region.parquet`
> {code}
> {code}
> create table mytable3  partition by (r_regionkey) as
>   select r.r_regionkey, r.r_name, n.n_nationkey, n.n_name 
>   from cp.`tpch/nation.parquet` n, cp.`tpch/region.parquet` r
>   where n.n_regionkey = r.r_regionkey
> {code}
> Invalid case 1: Partition column is not in table's column list. 
> {code}
> create table mytable4  partition by (r_regionkey2) as 
>   select r_regionkey, r_name from cp.`tpch/region.parquet`
> {code}
> Invalid case 2: Partition column is resolved to * out of a join operator.
> {code}
> create table mytable5  partition by (r_regionkey) as
>   select * 
>   from cp.`tpch/nation.parquet` n, cp.`tpch/region.parquet` r
>   where n.n_regionkey = r.r_regionkey
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to