[ https://issues.apache.org/jira/browse/DRILL-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596909#comment-14596909 ]
Jinfeng Ni commented on DRILL-3246: ----------------------------------- Sounds good. Then, let's keep the current ordering of partition by clause in the code for now. > Query planning support for partition by clause in Drill's CTAS statement > ------------------------------------------------------------------------ > > Key: DRILL-3246 > URL: https://issues.apache.org/jira/browse/DRILL-3246 > Project: Apache Drill > Issue Type: New Feature > Components: Query Planning & Optimization > Affects Versions: 1.0.0 > Reporter: Jinfeng Ni > Assignee: Jinfeng Ni > Fix For: 1.1.0 > > > We are going to add "PARTITION BY" clause in Drill's CTAS statement. The > "PARTITION BY" clause will specify the list of columns out of the result > table's column list that will be used to partition the data. > CREATE TABLE table_name [ (col_name, .... ) ] > [PARTITION BY (col_name, ...)] > AS SELECT_STATEMENT; > Semantics restriction for the PARTITION BY clause: > - All the columns in the PARTITION BY clause have to be in the table's > column list, or the SELECT_STATEMENT has a * column, when the base table in > the SELECT_STATEMENT is schema-less. Otherwise, an query validation error > would be raised. > - When the partition column is resolved to * column in a schema-less query, > this * column could not be a result of join operation. This restriction is > added, since for * out of join operation, query planner would not know which > table might produce this partition column. > Example : > {code} > create table mytable1 partition by (r_regionkey) as > select r_regionkey, r_name from cp.`tpch/region.parquet` > {code} > {code} > create table mytable2 partition by (r_regionkey) as > select * from cp.`tpch/region.parquet` > {code} > {code} > create table mytable3 partition by (r_regionkey) as > select r.r_regionkey, r.r_name, n.n_nationkey, n.n_name > from cp.`tpch/nation.parquet` n, cp.`tpch/region.parquet` r > where n.n_regionkey = r.r_regionkey > {code} > Invalid case 1: Partition column is not in table's column list. > {code} > create table mytable4 partition by (r_regionkey2) as > select r_regionkey, r_name from cp.`tpch/region.parquet` > {code} > Invalid case 2: Partition column is resolved to * out of a join operator. > {code} > create table mytable5 partition by (r_regionkey) as > select * > from cp.`tpch/nation.parquet` n, cp.`tpch/region.parquet` r > where n.n_regionkey = r.r_regionkey > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)