[ 
https://issues.apache.org/jira/browse/DRILL-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596697#comment-14596697
 ] 

Jinfeng Ni commented on DRILL-3246:
-----------------------------------

Here is what I found with respect of partitioning clause in CTAS, in Oracle, 
MySQL, DB2.

1. Oracle [1] : partition by clause before SELECT statement.  

{code}
[ column_properties ]
[ table_partitioning_clauses ]
[ CACHE | NOCACHE ]
[ parallel_clause ]
[ ROWDEPENDENCIES | NOROWDEPENDENCIES ]
[ enable_disable_clause ]
  [ enable_disable_clause ]...
[ row_movement_clause ]
[ AS subquery ]
{code} 

Actually, the link lead me to the wrong conclusion has an example, which I just 
missed completely. (my bad). 

2. MySQL[2] : partition by clause before SELECT statement. 
{code}
CREATE [TEMPORARY] TABLE [IF NOT EXISTS] tbl_name
    [(create_definition,...)]
    [table_options]
    [partition_options]
    select_statement
{code}

3. DB2 LUW[3]:   Partition by clause after SELECT statement. 

Although the ordering is different for Oracle /DB2, both seem to restrict the 
partitioning expression to be a column in the targeted table, represented as an 
unqualified name. MySql seems to be different in that it allows general 
expression in the partitioning expression.

Looks like neither of the above three allows to use ordinal column reference in 
partition by clause.  However, if the ordinal reference is refer to the column 
list (explicitly or implicitly) of target table, then it should be fine to 
support ordinal reference, except for the case of select * in schema-less 
query.   The ordinal reference would not work for "ORDER BY" when select clause 
has * from schema-less table. 
 
To summary,  3 systems put partition by clause before SELECT statement, 1 
system does the opposite.  Only RedShift seems to support ordinal reference in 
partitioning by clause. 

[1]. 
http://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_7002.htm#i2215406
[2]. https://dev.mysql.com/doc/refman/5.7/en/create-table.html
[3]. 
https://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.sql.ref.doc/doc/r0000927.html?cp=SSEPGG_9.7.0%2F2-10-6-90




> Query planning support for partition by clause in Drill's CTAS statement
> ------------------------------------------------------------------------
>
>                 Key: DRILL-3246
>                 URL: https://issues.apache.org/jira/browse/DRILL-3246
>             Project: Apache Drill
>          Issue Type: New Feature
>          Components: Query Planning & Optimization
>    Affects Versions: 1.0.0
>            Reporter: Jinfeng Ni
>            Assignee: Jinfeng Ni
>             Fix For: 1.1.0
>
>
> We are going to add "PARTITION BY" clause in Drill's CTAS statement. The 
> "PARTITION BY" clause will specify the list of columns out of the result 
> table's column list that will be used to partition the data.  
> CREATE TABLE  table_name  [ (col_name, .... ) ]
> [PARTITION BY (col_name, ...)]
> AS SELECT_STATEMENT;
> Semantics restriction for the PARTITION BY clause:
>  -  All the columns in the PARTITION BY clause have to be in the table's 
> column list, or the SELECT_STATEMENT has a * column, when the base table in 
> the SELECT_STATEMENT is schema-less.  Otherwise, an query validation error 
> would be raised.
>  - When the partition column is resolved to * column in a schema-less query, 
> this * column could not be a result of join operation. This restriction is 
> added, since for * out of join operation, query planner would not know which 
> table might produce this partition column. 
> Example :
> {code}
> create table mytable1  partition by (r_regionkey) as 
>   select r_regionkey, r_name from cp.`tpch/region.parquet`
> {code}
> {code}
> create table mytable2  partition by (r_regionkey) as 
>   select * from cp.`tpch/region.parquet`
> {code}
> {code}
> create table mytable3  partition by (r_regionkey) as
>   select r.r_regionkey, r.r_name, n.n_nationkey, n.n_name 
>   from cp.`tpch/nation.parquet` n, cp.`tpch/region.parquet` r
>   where n.n_regionkey = r.r_regionkey
> {code}
> Invalid case 1: Partition column is not in table's column list. 
> {code}
> create table mytable4  partition by (r_regionkey2) as 
>   select r_regionkey, r_name from cp.`tpch/region.parquet`
> {code}
> Invalid case 2: Partition column is resolved to * out of a join operator.
> {code}
> create table mytable5  partition by (r_regionkey) as
>   select * 
>   from cp.`tpch/nation.parquet` n, cp.`tpch/region.parquet` r
>   where n.n_regionkey = r.r_regionkey
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to