[
https://issues.apache.org/jira/browse/HIVE-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565467#comment-14565467
]
Xuefu Zhang commented on HIVE-10283:
------------------------------------
If this is the case, then it's a serious issue. My guess is that the number of
reducer isn't set correctly. This seems to be a different issue than this JIRA.
Could you please create a new JIRA. If this also happens to 1.2 release, then
we need to mark it as a blocker for 1.2.1 release as well.
> HIVE-4240 may be causing issue with bucketed tables
> ----------------------------------------------------
>
> Key: HIVE-10283
> URL: https://issues.apache.org/jira/browse/HIVE-10283
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Reporter: Ryan P
>
> I suspect that by removing the reducer, HIVE-4240, may be causing issues.
> Because of this inserts will not consolidate 'buckets' into single files
> which is problematic when attempting to use bucketmapjoin.
> CREATE TABLE IF NOT EXISTS buckettestinput(
> data string
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> CREATE TABLE IF NOT EXISTS buckettestoutput1(
> data string
> )CLUSTERED BY(data)
> INTO 2 BUCKETS
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> CREATE TABLE IF NOT EXISTS buckettestoutput2(
> data string
> )CLUSTERED BY(data)
> INTO 2 BUCKETS
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> Then I inserted the following data into the "buckettestinput" table
> firstinsert1
> firstinsert2
> firstinsert3
> firstinsert4
> firstinsert5
> firstinsert6
> firstinsert7
> firstinsert8
> secondinsert1
> secondinsert2
> secondinsert3
> secondinsert4
> secondinsert5
> secondinsert6
> secondinsert7
> secondinsert8
> set hive.enforce.bucketing = true;
> set hive.enforce.sorting=true;
> insert into table buckettestoutput1
> select * from buckettestinput where data like 'first%'
> SELECT *
> FROM buckettestoutput1 TABLESAMPLE(BUCKET 1 OUT OF 1 ON data) s;
> insert into table buckettestoutput1
> select * from buckettestinput where data like 'second%'
> check the results of the table sample query.
> for sort merge bucket map join
> set hive.auto.convert.sortmerge.join=true;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.auto.convert.sortmerge.join.noconditionaltask=true;
> select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data)
> hive> select * from buckettestoutput1 a join buckettestoutput2 b on
> (a.data=b.data);
> FAILED: SemanticException [Error 10141]: Bucketed table metadata is not
> correct. Fix the metadata or don't use bucketed mapjoin, by setting
> hive.enforce.bucketmapjoin to false. The number of buckets for table
> buckettestoutput1 is 2, whereas the number of files is 4
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)