[ 
https://issues.apache.org/jira/browse/DRILL-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595166#comment-14595166
 ] 

Jinfeng Ni commented on DRILL-3246:
-----------------------------------

Oops. I did not realize you meant the ordering of clauses. 

The main reason for the current ordering of clauses is in either Hive, or 
relational SQL DB, the partition by clause is right behind the table's column 
list:

{code} 
Create table t1 (c1, c2, c3 )
partition by c1
....
{code}

For our CTAS cases,  although the table's column list is optional, use still 
could explicitly put the column list in the CTAS statement. That's why we chose 
that ordering.

On the other hand, in the case that user does not explicitly list the colums, 
like :
{code}
CREATE TABLE T1
AS 
  SELECT C1, C2, C3
  FROM ...
{code}

putting partition by after the SELECT statement seems more logical.  This is 
particular true in your example, where select statement is select *.  

So, I agree it makes more sense to put partition by after the SELECT statement. 
Let me see if it has any impact on the SQL parser.


> Query planning support for partition by clause in Drill's CTAS statement
> ------------------------------------------------------------------------
>
>                 Key: DRILL-3246
>                 URL: https://issues.apache.org/jira/browse/DRILL-3246
>             Project: Apache Drill
>          Issue Type: New Feature
>          Components: Query Planning & Optimization
>    Affects Versions: 1.0.0
>            Reporter: Jinfeng Ni
>            Assignee: Jinfeng Ni
>             Fix For: 1.1.0
>
>
> We are going to add "PARTITION BY" clause in Drill's CTAS statement. The 
> "PARTITION BY" clause will specify the list of columns out of the result 
> table's column list that will be used to partition the data.  
> CREATE TABLE  table_name  [ (col_name, .... ) ]
> [PARTITION BY (col_name, ...)]
> AS SELECT_STATEMENT;
> Semantics restriction for the PARTITION BY clause:
>  -  All the columns in the PARTITION BY clause have to be in the table's 
> column list, or the SELECT_STATEMENT has a * column, when the base table in 
> the SELECT_STATEMENT is schema-less.  Otherwise, an query validation error 
> would be raised.
>  - When the partition column is resolved to * column in a schema-less query, 
> this * column could not be a result of join operation. This restriction is 
> added, since for * out of join operation, query planner would not know which 
> table might produce this partition column. 
> Example :
> {code}
> create table mytable1  partition by (r_regionkey) as 
>   select r_regionkey, r_name from cp.`tpch/region.parquet`
> {code}
> {code}
> create table mytable2  partition by (r_regionkey) as 
>   select * from cp.`tpch/region.parquet`
> {code}
> {code}
> create table mytable3  partition by (r_regionkey) as
>   select r.r_regionkey, r.r_name, n.n_nationkey, n.n_name 
>   from cp.`tpch/nation.parquet` n, cp.`tpch/region.parquet` r
>   where n.n_regionkey = r.r_regionkey
> {code}
> Invalid case 1: Partition column is not in table's column list. 
> {code}
> create table mytable4  partition by (r_regionkey2) as 
>   select r_regionkey, r_name from cp.`tpch/region.parquet`
> {code}
> Invalid case 2: Partition column is resolved to * out of a join operator.
> {code}
> create table mytable5  partition by (r_regionkey) as
>   select * 
>   from cp.`tpch/nation.parquet` n, cp.`tpch/region.parquet` r
>   where n.n_regionkey = r.r_regionkey
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to