[
https://issues.apache.org/jira/browse/DRILL-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595265#comment-14595265
]
Jinfeng Ni commented on DRILL-3246:
-----------------------------------
I changed the ordering of partition by clause, and I did not find any issue for
the parsing logic.
If there is no objection, I'm going to change the syntax of partition by, such
that it will appear as an optional clause right after the select statement.
{code}
create table mytable1 as
select r_regionkey, r_name from cp.`tpch/region.parquet`
partition by (r_regionkey)
{code}
{code}
create table mytable2 as
select * from cp.`tpch/region.parquet`
partition by (r_regionkey)
{code}
{code}
create table mytable3 as
select r.r_regionkey, r.r_name, n.n_nationkey, n.n_name
from cp.`tpch/nation.parquet` n, cp.`tpch/region.parquet` r
where n.n_regionkey = r.r_regionkey
partition by (r_regionkey)
{code}
For [~jnadeau]'s question, HIVE seems to not allow partitioning on CTAS. I
checked the other relational DB systems, and I also did not see they support
partitioning on CTAS. One reason might be that they could achieve the
functionality of partitioning of CTAS using two statements:
1. Create table target_table partition by ...
2. Insert into table target_table
select from .....
> Query planning support for partition by clause in Drill's CTAS statement
> ------------------------------------------------------------------------
>
> Key: DRILL-3246
> URL: https://issues.apache.org/jira/browse/DRILL-3246
> Project: Apache Drill
> Issue Type: New Feature
> Components: Query Planning & Optimization
> Affects Versions: 1.0.0
> Reporter: Jinfeng Ni
> Assignee: Jinfeng Ni
> Fix For: 1.1.0
>
>
> We are going to add "PARTITION BY" clause in Drill's CTAS statement. The
> "PARTITION BY" clause will specify the list of columns out of the result
> table's column list that will be used to partition the data.
> CREATE TABLE table_name [ (col_name, .... ) ]
> [PARTITION BY (col_name, ...)]
> AS SELECT_STATEMENT;
> Semantics restriction for the PARTITION BY clause:
> - All the columns in the PARTITION BY clause have to be in the table's
> column list, or the SELECT_STATEMENT has a * column, when the base table in
> the SELECT_STATEMENT is schema-less. Otherwise, an query validation error
> would be raised.
> - When the partition column is resolved to * column in a schema-less query,
> this * column could not be a result of join operation. This restriction is
> added, since for * out of join operation, query planner would not know which
> table might produce this partition column.
> Example :
> {code}
> create table mytable1 partition by (r_regionkey) as
> select r_regionkey, r_name from cp.`tpch/region.parquet`
> {code}
> {code}
> create table mytable2 partition by (r_regionkey) as
> select * from cp.`tpch/region.parquet`
> {code}
> {code}
> create table mytable3 partition by (r_regionkey) as
> select r.r_regionkey, r.r_name, n.n_nationkey, n.n_name
> from cp.`tpch/nation.parquet` n, cp.`tpch/region.parquet` r
> where n.n_regionkey = r.r_regionkey
> {code}
> Invalid case 1: Partition column is not in table's column list.
> {code}
> create table mytable4 partition by (r_regionkey2) as
> select r_regionkey, r_name from cp.`tpch/region.parquet`
> {code}
> Invalid case 2: Partition column is resolved to * out of a join operator.
> {code}
> create table mytable5 partition by (r_regionkey) as
> select *
> from cp.`tpch/nation.parquet` n, cp.`tpch/region.parquet` r
> where n.n_regionkey = r.r_regionkey
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)