[
https://issues.apache.org/jira/browse/HIVE-10879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yongzhi Chen resolved HIVE-10879.
---------------------------------
Resolution: Duplicate
> The bucket number is not respected in insert overwrite.
> -------------------------------------------------------
>
> Key: HIVE-10879
> URL: https://issues.apache.org/jira/browse/HIVE-10879
> Project: Hive
> Issue Type: Bug
> Affects Versions: 1.2.0
> Reporter: Yongzhi Chen
> Priority: Blocker
>
> When hive.enforce.bucketing is true, the bucket number defined in the table
> is no longer respected in current master and 1.2. This is a regression.
> Reproduce:
> {noformat}
> CREATE TABLE IF NOT EXISTS buckettestinput(
> data string
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> CREATE TABLE IF NOT EXISTS buckettestoutput1(
> data string
> )CLUSTERED BY(data)
> INTO 2 BUCKETS
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> CREATE TABLE IF NOT EXISTS buckettestoutput2(
> data string
> )CLUSTERED BY(data)
> INTO 2 BUCKETS
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> Then I inserted the following data into the "buckettestinput" table
> firstinsert1
> firstinsert2
> firstinsert3
> firstinsert4
> firstinsert5
> firstinsert6
> firstinsert7
> firstinsert8
> secondinsert1
> secondinsert2
> secondinsert3
> secondinsert4
> secondinsert5
> secondinsert6
> secondinsert7
> secondinsert8
> set hive.enforce.bucketing = true;
> set hive.enforce.sorting=true;
> insert overwrite table buckettestoutput1
> select * from buckettestinput where data like 'first%';
> set hive.auto.convert.sortmerge.join=true;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data);
> Error: Error while compiling statement: FAILED: SemanticException [Error
> 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use
> bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number
> of buckets for table buckettestoutput1 is 2, whereas the number of files is 1
> (state=42000,code=10141)
> {noformat}
> The related debug information related to insert overwrite:
> {noformat}
> 0: jdbc:hive2://localhost:10000> insert overwrite table buckettestoutput1
> select * from buckettestinput where data like 'first%'insert overwrite table
> buckettestoutput1
> 0: jdbc:hive2://localhost:10000> ;
> select * from buckettestinput where data like '
> first%';
> INFO : Number of reduce tasks determined at compile time: 2
> INFO : In order to change the average load for a reducer (in bytes):
> INFO : set hive.exec.reducers.bytes.per.reducer=<number>
> INFO : In order to limit the maximum number of reducers:
> INFO : set hive.exec.reducers.max=<number>
> INFO : In order to set a constant number of reducers:
> INFO : set mapred.reduce.tasks=<number>
> INFO : Job running in-process (local Hadoop)
> INFO : 2015-06-01 11:09:29,650 Stage-1 map = 86%, reduce = 100%
> INFO : Ended Job = job_local107155352_0001
> INFO : Loading data to table default.buckettestoutput1 from
> file:/user/hive/warehouse/buckettestoutput1/.hive-staging_hive_2015-06-01_11-09-28_166_3109203968904090801-1/-ext-10000
> INFO : Table default.buckettestoutput1 stats: [numFiles=1, numRows=4,
> totalSize=52, rawDataSize=48]
> No rows affected (1.692 seconds)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)