[ 
https://issues.apache.org/jira/browse/DRILL-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596202#comment-14596202
 ] 

Jinfeng Ni commented on DRILL-3246:
-----------------------------------

After reading comments from [~jnadeau] and [~amansinha100], I feel the answers 
to where we put the partition clause would be related to the question whether 
the partition by is part of CTAS statement or SELECT statement embedded in 
CTAS. This question will determine the naming resolution for the columns in 
partition by.

If partition by is part of SELECT statement, then the naming resolution will 
apply the same logic as the one for order by / group by / where. 

{code}
CREATE TABLE T1 ( C1, C2, C3) as 
AS
SELECT  D1, D2, D3
FROM T2
{CODE}

In the above case, should user use partition by (C1), or use partition by (D1) 
? Or we should accept both?

I'm more inclined to treat partition by as a part of CTAS. That is, the 
partitioning columns' reference should be with the target table's column list, 
not the columns / alias in the SELECT statement.  

{code}
CREATE TABLE T1 ( C1, C2, C3) as 
PARTITION BY (D1)                        /* Not valid, since partition by only 
see "C1" in the target table */
AS
SELECT  D1, D2, D3
FROM T2

or:

CREATE TABLE T1 ( C1, C2, C3) as 
AS
SELECT  D1, D2, D3
FROM T2
PARTITION BY (D1)                        /* Not valid, since partition by only 
see "C1" in the target table */
{code} 

{code}
CREATE TABLE T1 as 
AS
SELECT  D1, D2, D3
FROM T2
PARTITION BY (D1 + 100)                        /* Not valid, since partition by 
only see "D1" in the target table */
{code}

Putting partition by right before "AS", would avoid the confusion that user 
might be using the columns only accessible in the SELECT statement.




> Query planning support for partition by clause in Drill's CTAS statement
> ------------------------------------------------------------------------
>
>                 Key: DRILL-3246
>                 URL: https://issues.apache.org/jira/browse/DRILL-3246
>             Project: Apache Drill
>          Issue Type: New Feature
>          Components: Query Planning & Optimization
>    Affects Versions: 1.0.0
>            Reporter: Jinfeng Ni
>            Assignee: Jinfeng Ni
>             Fix For: 1.1.0
>
>
> We are going to add "PARTITION BY" clause in Drill's CTAS statement. The 
> "PARTITION BY" clause will specify the list of columns out of the result 
> table's column list that will be used to partition the data.  
> CREATE TABLE  table_name  [ (col_name, .... ) ]
> [PARTITION BY (col_name, ...)]
> AS SELECT_STATEMENT;
> Semantics restriction for the PARTITION BY clause:
>  -  All the columns in the PARTITION BY clause have to be in the table's 
> column list, or the SELECT_STATEMENT has a * column, when the base table in 
> the SELECT_STATEMENT is schema-less.  Otherwise, an query validation error 
> would be raised.
>  - When the partition column is resolved to * column in a schema-less query, 
> this * column could not be a result of join operation. This restriction is 
> added, since for * out of join operation, query planner would not know which 
> table might produce this partition column. 
> Example :
> {code}
> create table mytable1  partition by (r_regionkey) as 
>   select r_regionkey, r_name from cp.`tpch/region.parquet`
> {code}
> {code}
> create table mytable2  partition by (r_regionkey) as 
>   select * from cp.`tpch/region.parquet`
> {code}
> {code}
> create table mytable3  partition by (r_regionkey) as
>   select r.r_regionkey, r.r_name, n.n_nationkey, n.n_name 
>   from cp.`tpch/nation.parquet` n, cp.`tpch/region.parquet` r
>   where n.n_regionkey = r.r_regionkey
> {code}
> Invalid case 1: Partition column is not in table's column list. 
> {code}
> create table mytable4  partition by (r_regionkey2) as 
>   select r_regionkey, r_name from cp.`tpch/region.parquet`
> {code}
> Invalid case 2: Partition column is resolved to * out of a join operator.
> {code}
> create table mytable5  partition by (r_regionkey) as
>   select * 
>   from cp.`tpch/nation.parquet` n, cp.`tpch/region.parquet` r
>   where n.n_regionkey = r.r_regionkey
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to