[jira] Commented: (HIVE-1002) multi-partition inserts

Ning Zhang (JIRA) Wed, 03 Mar 2010 15:10:51 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-1002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840940#action_12840940
 ]


Ning Zhang commented on HIVE-1002:
----------------------------------

I think it good to let the user specify the partition columns just like it is 
done currently. We will allow user to left some partition columns to be dynamic 
partition columns which means they don't need to give the value at compile 
time. Which partition a row is inserted is determined at runtime. 

However, one issue is that if the order of the partition columns in the DML are 
different from the their order in DDL, we should thrown an error if some static 
partition followed by a dynamic partition. For example
{code}
insert overwrite table T partition (ds, hr=12) select ...
{code}

should throw an error. The reason is that the order of the partition column 
determines the directory hierarchy (hr is a subdirectory of ds). This is 
determined at create table time. If we allow the above DML, we have to have a 
clear semantics:  we should either change all ds partitions who has a 
subdirectory hr=12, or we should complete overwrite the table and use a 
different directory hierarchy (ds being a subdirectory of hr).  The first 
solution is potentially very expensive and rarely seen in practice. The second 
solution is potentially dangerous since the user could accidentally entered the 
wrong order and the whole table got overwritten rather than some partition got 
updated. Also the second case has a workaround: the user could create another 
partitioned table with different partition column ordering and use the above 
DML to load data.  


> multi-partition inserts
> -----------------------
>
>                 Key: HIVE-1002
>                 URL: https://issues.apache.org/jira/browse/HIVE-1002
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Zheng Shao
>            Assignee: Ning Zhang
>
> We should allow queries like this into a partitioned table:
> {code}
> CREATE TABLE (a STRING, b STRING, c STRING)
> PARTITIONED BY (ds STRING, ts STRING);
> INSERT OVERWRITE TABLE x PARTITION (ds = '2009-12-12')
> SELECT a, b, c, ts FROM xxx;
> {code}
> Basically, allowing users to overwrite multiple partitions at a time.
> The partition values specified in PARTITION part (if any) should be a prefix 
> of the partition keys.
> The rest of the partition keys goes to the end of the SELECT expression list.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1002) multi-partition inserts

Reply via email to