[jira] [Commented] (DRILL-3246) Query planning support for partition by clause in Drill's CTAS statement

Aman Sinha (JIRA) Sun, 21 Jun 2015 20:47:51 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595318#comment-14595318
 ]


Aman Sinha commented on DRILL-3246:
-----------------------------------

>>> It isn't immediately clear as to why you think it would more closely 
>>> associated with DDL?

I think of partitioning of a table as similar to creating an index or defining 
the primary/foreign key or defining a sort key.  It aligns with the target 
table more than the input stream of data defined by the Select.   I think 
having it as part of Select could pose some difficulties: for instance: 
 - if there is a union in the SELECT and a partitioning clause, the scope of 
the partitioning clause would have to be defined ... this 
    would be similar to the ORDER BY clause but I am not completely sure if 
they will work the same way.. e.g Order-By supports ordinals 
    but afaik Partition-By does not. 
 - If the Partitioning column is an aggregate expression or a window function 
or some other function,  one would have to define an alias in
    the SELECT list, otherwise the Partition-By clause would need to include 
the exact function expression similar to what we do for GROUP-BY. 
 - If there are duplicate columns in the SELECT list; for such cases I think 
the only option is to have the columns explicitly specified 
    (unambiguously) in the CREATE TABLE statement.

Going forward, if we add a sort property to the table creation, I would think 
it would be easier to extend the CREATE TABLE syntax and have both partitioning 
and sortedness appear there.   This way it does not conflict with the ORDER BY 
in the SELECT statement since one could have ORDER BY 'a1'  but still create 
the table with sortedness on column 'b1'. 

> Query planning support for partition by clause in Drill's CTAS statement
> ------------------------------------------------------------------------
>
>                 Key: DRILL-3246
>                 URL: https://issues.apache.org/jira/browse/DRILL-3246
>             Project: Apache Drill
>          Issue Type: New Feature
>          Components: Query Planning & Optimization
>    Affects Versions: 1.0.0
>            Reporter: Jinfeng Ni
>            Assignee: Jinfeng Ni
>             Fix For: 1.1.0
>
>
> We are going to add "PARTITION BY" clause in Drill's CTAS statement. The 
> "PARTITION BY" clause will specify the list of columns out of the result 
> table's column list that will be used to partition the data.  
> CREATE TABLE  table_name  [ (col_name, .... ) ]
> [PARTITION BY (col_name, ...)]
> AS SELECT_STATEMENT;
> Semantics restriction for the PARTITION BY clause:
>  -  All the columns in the PARTITION BY clause have to be in the table's 
> column list, or the SELECT_STATEMENT has a * column, when the base table in 
> the SELECT_STATEMENT is schema-less.  Otherwise, an query validation error 
> would be raised.
>  - When the partition column is resolved to * column in a schema-less query, 
> this * column could not be a result of join operation. This restriction is 
> added, since for * out of join operation, query planner would not know which 
> table might produce this partition column. 
> Example :
> {code}
> create table mytable1  partition by (r_regionkey) as 
>   select r_regionkey, r_name from cp.`tpch/region.parquet`
> {code}
> {code}
> create table mytable2  partition by (r_regionkey) as 
>   select * from cp.`tpch/region.parquet`
> {code}
> {code}
> create table mytable3  partition by (r_regionkey) as
>   select r.r_regionkey, r.r_name, n.n_nationkey, n.n_name 
>   from cp.`tpch/nation.parquet` n, cp.`tpch/region.parquet` r
>   where n.n_regionkey = r.r_regionkey
> {code}
> Invalid case 1: Partition column is not in table's column list. 
> {code}
> create table mytable4  partition by (r_regionkey2) as 
>   select r_regionkey, r_name from cp.`tpch/region.parquet`
> {code}
> Invalid case 2: Partition column is resolved to * out of a join operator.
> {code}
> create table mytable5  partition by (r_regionkey) as
>   select * 
>   from cp.`tpch/nation.parquet` n, cp.`tpch/region.parquet` r
>   where n.n_regionkey = r.r_regionkey
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3246) Query planning support for partition by clause in Drill's CTAS statement

Reply via email to