[
https://issues.apache.org/jira/browse/HIVE-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565200#comment-14565200
]
Xuefu Zhang edited comment on HIVE-10283 at 5/29/15 9:29 PM:
-------------------------------------------------------------
[~xuefuz] && [~szehon], could you find someone who know this part well work on
the issue. Currently, in upstream master code , number of buckets is not
respected even with insert overwrite. (insert overwrite only create 1 bucket
file while the table definition is 2.
Reproduce:
{noformat}
create table buckettest (data string) partitioned by (state string) clustered
by (data) into 2 buckets;
set hive.enforce.bucketing = true;
insert overwrite table buckettest partition(state='MA') select code from jsmall
limit 10;
set hive.auto.convert.sortmerge.join=true;
set hive.optimize.bucketmapjoin = true;
set hive.optimize.bucketmapjoin.sortedmerge = true;
0: jdbc:hive2://localhost:10000> select * from buckettest a join
buckettestoutput2 b on (a.data=b.data);
select * from buckettest a join buckettestoutpu
t2 b on (a.data=b.data);
Error: Error while compiling statement: FAILED: SemanticException [Error
10141]: Bucketed table metadata is not correct. Fix the metadata or don't use
bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number of
buckets for table buckettest partition state=MA is 2, whereas the number of
files is 1 (state=42000,code=10141)
{noformat}
was (Author: ychena):
[~xuefuz] && [~szehon], could you find someone who know this part well work on
the issue. Currently, in upstream master code , number of buckets is not
respected even with insert overwrite. (insert overwrite only create 1 bucket
file while the table definition is 2.
Reproduce:
{noformat}
create table buckettest (data string) partitioned by (state string) clustered
by (data) into 2 buckets;
set hive.enforce.bucketing = true;
insert overwrite table buckettest partition(state='MA') select code from jsmall
limit 10;
set hive.auto.convert.sortmerge.join=true;
set hive.optimize.bucketmapjoin = true;
set hive.optimize.bucketmapjoin.sortedmerge = true;
0: jdbc:hive2://localhost:10000> select * from buckettest a join
buckettestoutput2 b on (a.data=b.data);
select * from buckettest a join buckettestoutpu
t2 b on (a.data=b.data);
Error: Error while compiling statement: FAILED: SemanticException [Error
10141]: Bucketed table metadata is not correct. Fix the metadata or don't use
bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number of
buckets for table buckettest partition state=MA is 2, whereas the number of
files is 1 (state=42000,code=10141)
> HIVE-4240 may be causing issue with bucketed tables
> ----------------------------------------------------
>
> Key: HIVE-10283
> URL: https://issues.apache.org/jira/browse/HIVE-10283
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Reporter: Ryan P
>
> I suspect that by removing the reducer, HIVE-4240, may be causing issues.
> Because of this inserts will not consolidate 'buckets' into single files
> which is problematic when attempting to use bucketmapjoin.
> CREATE TABLE IF NOT EXISTS buckettestinput(
> data string
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> CREATE TABLE IF NOT EXISTS buckettestoutput1(
> data string
> )CLUSTERED BY(data)
> INTO 2 BUCKETS
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> CREATE TABLE IF NOT EXISTS buckettestoutput2(
> data string
> )CLUSTERED BY(data)
> INTO 2 BUCKETS
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> Then I inserted the following data into the "buckettestinput" table
> firstinsert1
> firstinsert2
> firstinsert3
> firstinsert4
> firstinsert5
> firstinsert6
> firstinsert7
> firstinsert8
> secondinsert1
> secondinsert2
> secondinsert3
> secondinsert4
> secondinsert5
> secondinsert6
> secondinsert7
> secondinsert8
> set hive.enforce.bucketing = true;
> set hive.enforce.sorting=true;
> insert into table buckettestoutput1
> select * from buckettestinput where data like 'first%'
> SELECT *
> FROM buckettestoutput1 TABLESAMPLE(BUCKET 1 OUT OF 1 ON data) s;
> insert into table buckettestoutput1
> select * from buckettestinput where data like 'second%'
> check the results of the table sample query.
> for sort merge bucket map join
> set hive.auto.convert.sortmerge.join=true;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.auto.convert.sortmerge.join.noconditionaltask=true;
> select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data)
> hive> select * from buckettestoutput1 a join buckettestoutput2 b on
> (a.data=b.data);
> FAILED: SemanticException [Error 10141]: Bucketed table metadata is not
> correct. Fix the metadata or don't use bucketed mapjoin, by setting
> hive.enforce.bucketmapjoin to false. The number of buckets for table
> buckettestoutput1 is 2, whereas the number of files is 4
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)