[jira] [Commented] (HIVE-10866) Give a error when client try to insert into bucketed table for SMB

Yongzhi Chen (JIRA) Fri, 12 Jun 2015 11:04:49 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583802#comment-14583802
 ]


Yongzhi Chen commented on HIVE-10866:
-------------------------------------

[~xuefuz], as we discussed, hive should throw an error when enforce.bucketing 
is true. I think patch 2 should meet the requirement. Please review the code. 
The 3 qtest files that I change are the tests the error will affect. Although 
some table called acid, the insert into is just ordinary statements. The error 
is thrown in execution time, I checked there is no file leak or something 
similar. For I would like the first insert into succeed. Thanks

> Give a error when client try to insert into bucketed table for SMB
> ------------------------------------------------------------------
>
>                 Key: HIVE-10866
>                 URL: https://issues.apache.org/jira/browse/HIVE-10866
>             Project: Hive
>          Issue Type: Improvement
>    Affects Versions: 1.2.0, 1.3.0
>            Reporter: Yongzhi Chen
>            Assignee: Yongzhi Chen
>         Attachments: HIVE-10866.1.patch, HIVE-10866.2.patch, 
> HIVE-10866.3.patch, HIVE-10866.4.patch
>
>
> Currently, hive does not support appends(insert into) bucketed table, see 
> open jira HIVE-3608. When insert into such table, the data will be 
> "corrupted" and not fit for sort merge bucket mapjoin. 
> We need find a way to prevent client from inserting into such table. Or at 
> least give a warning.
> Reproduce:
> {noformat}
> CREATE TABLE IF NOT EXISTS buckettestoutput1( 
> data string 
> )CLUSTERED BY(data) 
> INTO 2 BUCKETS 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> CREATE TABLE IF NOT EXISTS buckettestoutput2( 
> data string 
> )CLUSTERED BY(data) 
> INTO 2 BUCKETS 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> set hive.enforce.bucketing = true; 
> set hive.enforce.sorting=true;
> insert into table buckettestoutput1 select code from sample_07 where 
> total_emp < 134354250 limit 10;
> After this first insert, I did:
> set hive.auto.convert.sortmerge.join=true; 
> set hive.optimize.bucketmapjoin = true; 
> set hive.optimize.bucketmapjoin.sortedmerge = true; 
> set hive.auto.convert.sortmerge.join.noconditionaltask=true;
> 0: jdbc:hive2://localhost:10000> select * from buckettestoutput1 a join 
> buckettestoutput2 b on (a.data=b.data);
> +-------+-------+
> | data  | data  |
> +-------+-------+
> +-------+-------+
> So select works fine. 
> Second insert:
> 0: jdbc:hive2://localhost:10000> insert into table buckettestoutput1 select 
> code from sample_07 where total_emp >= 134354250 limit 10;
> No rows affected (61.235 seconds)
> Then select:
> 0: jdbc:hive2://localhost:10000> select * from buckettestoutput1 a join 
> buckettestoutput2 b on (a.data=b.data);
> Error: Error while compiling statement: FAILED: SemanticException [Error 
> 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
> bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number 
> of buckets for table buckettestoutput1 is 2, whereas the number of files is 4 
> (state=42000,code=10141)
> 0: jdbc:hive2://localhost:10000>
> {noformat}
> Insert into empty table or partition will be fine, but insert into the 
> non-empty one (after second insert in the reproduce), the SMB mapjoin will 
> throw an error. We should not let second insert succeed when user explicitly 
> want to enforce bucketing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10866) Give a error when client try to insert into bucketed table for SMB

Reply via email to