[ 
https://issues.apache.org/jira/browse/DRILL-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596404#comment-14596404
 ] 

Jinfeng Ni commented on DRILL-3246:
-----------------------------------

[~jnadeau], after discussed with [~amansinha100], we are more inclined to treat 
partition by clause as part of CTAS, in stead of part of SELECT statement :

1.   SQL standard has a well defined syntax for SELECT statement, which does 
not have partition by clause. 
Extending SELECT statement to add partition by / sort by / etc seems to lead to 
different syntax from SQL standard in this perspective.
2.   On the other hand, SQL standard does not specify partition by clause for 
CREATE TABLE either, since it is more like implementation dependent.  Different 
vendors would extend CREATE TABLE with vendor-specific syntax / implementation. 
So, seems it's probably more appropriate to extend CTAS in our use case. 
3.  Implementation-wise, if we treat PARTITION BY as part of SELECT statement, 
then the entire planner process in Calcite have to add such support, since it's 
a new component of SELECT statement.  That requires significant effort to 
implement.

Given that, we decide to stick to the current form of syntax for partition by 
clause.  We can open a new JIRA, if we decide to switch to the other form later 
on.  

(I merged the current form of code to master branch, since we have to give QA 
time to do testing).



> Query planning support for partition by clause in Drill's CTAS statement
> ------------------------------------------------------------------------
>
>                 Key: DRILL-3246
>                 URL: https://issues.apache.org/jira/browse/DRILL-3246
>             Project: Apache Drill
>          Issue Type: New Feature
>          Components: Query Planning & Optimization
>    Affects Versions: 1.0.0
>            Reporter: Jinfeng Ni
>            Assignee: Jinfeng Ni
>             Fix For: 1.1.0
>
>
> We are going to add "PARTITION BY" clause in Drill's CTAS statement. The 
> "PARTITION BY" clause will specify the list of columns out of the result 
> table's column list that will be used to partition the data.  
> CREATE TABLE  table_name  [ (col_name, .... ) ]
> [PARTITION BY (col_name, ...)]
> AS SELECT_STATEMENT;
> Semantics restriction for the PARTITION BY clause:
>  -  All the columns in the PARTITION BY clause have to be in the table's 
> column list, or the SELECT_STATEMENT has a * column, when the base table in 
> the SELECT_STATEMENT is schema-less.  Otherwise, an query validation error 
> would be raised.
>  - When the partition column is resolved to * column in a schema-less query, 
> this * column could not be a result of join operation. This restriction is 
> added, since for * out of join operation, query planner would not know which 
> table might produce this partition column. 
> Example :
> {code}
> create table mytable1  partition by (r_regionkey) as 
>   select r_regionkey, r_name from cp.`tpch/region.parquet`
> {code}
> {code}
> create table mytable2  partition by (r_regionkey) as 
>   select * from cp.`tpch/region.parquet`
> {code}
> {code}
> create table mytable3  partition by (r_regionkey) as
>   select r.r_regionkey, r.r_name, n.n_nationkey, n.n_name 
>   from cp.`tpch/nation.parquet` n, cp.`tpch/region.parquet` r
>   where n.n_regionkey = r.r_regionkey
> {code}
> Invalid case 1: Partition column is not in table's column list. 
> {code}
> create table mytable4  partition by (r_regionkey2) as 
>   select r_regionkey, r_name from cp.`tpch/region.parquet`
> {code}
> Invalid case 2: Partition column is resolved to * out of a join operator.
> {code}
> create table mytable5  partition by (r_regionkey) as
>   select * 
>   from cp.`tpch/nation.parquet` n, cp.`tpch/region.parquet` r
>   where n.n_regionkey = r.r_regionkey
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to