[
https://issues.apache.org/jira/browse/HIVE-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583802#comment-14583802
]
Yongzhi Chen commented on HIVE-10866:
-------------------------------------
[~xuefuz], as we discussed, hive should throw an error when enforce.bucketing
is true. I think patch 2 should meet the requirement. Please review the code.
The 3 qtest files that I change are the tests the error will affect. Although
some table called acid, the insert into is just ordinary statements. The error
is thrown in execution time, I checked there is no file leak or something
similar. For I would like the first insert into succeed. Thanks
> Give a error when client try to insert into bucketed table for SMB
> ------------------------------------------------------------------
>
> Key: HIVE-10866
> URL: https://issues.apache.org/jira/browse/HIVE-10866
> Project: Hive
> Issue Type: Improvement
> Affects Versions: 1.2.0, 1.3.0
> Reporter: Yongzhi Chen
> Assignee: Yongzhi Chen
> Attachments: HIVE-10866.1.patch, HIVE-10866.2.patch,
> HIVE-10866.3.patch, HIVE-10866.4.patch
>
>
> Currently, hive does not support appends(insert into) bucketed table, see
> open jira HIVE-3608. When insert into such table, the data will be
> "corrupted" and not fit for sort merge bucket mapjoin.
> We need find a way to prevent client from inserting into such table. Or at
> least give a warning.
> Reproduce:
> {noformat}
> CREATE TABLE IF NOT EXISTS buckettestoutput1(
> data string
> )CLUSTERED BY(data)
> INTO 2 BUCKETS
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> CREATE TABLE IF NOT EXISTS buckettestoutput2(
> data string
> )CLUSTERED BY(data)
> INTO 2 BUCKETS
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> set hive.enforce.bucketing = true;
> set hive.enforce.sorting=true;
> insert into table buckettestoutput1 select code from sample_07 where
> total_emp < 134354250 limit 10;
> After this first insert, I did:
> set hive.auto.convert.sortmerge.join=true;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.auto.convert.sortmerge.join.noconditionaltask=true;
> 0: jdbc:hive2://localhost:10000> select * from buckettestoutput1 a join
> buckettestoutput2 b on (a.data=b.data);
> +-------+-------+
> | data | data |
> +-------+-------+
> +-------+-------+
> So select works fine.
> Second insert:
> 0: jdbc:hive2://localhost:10000> insert into table buckettestoutput1 select
> code from sample_07 where total_emp >= 134354250 limit 10;
> No rows affected (61.235 seconds)
> Then select:
> 0: jdbc:hive2://localhost:10000> select * from buckettestoutput1 a join
> buckettestoutput2 b on (a.data=b.data);
> Error: Error while compiling statement: FAILED: SemanticException [Error
> 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use
> bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number
> of buckets for table buckettestoutput1 is 2, whereas the number of files is 4
> (state=42000,code=10141)
> 0: jdbc:hive2://localhost:10000>
> {noformat}
> Insert into empty table or partition will be fine, but insert into the
> non-empty one (after second insert in the reproduce), the SMB mapjoin will
> throw an error. We should not let second insert succeed when user explicitly
> want to enforce bucketing.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)