[jira] [Commented] (HIVE-10283) HIVE-4240 may be causing issue with bucketed tables

Xuefu Zhang (JIRA) Fri, 29 May 2015 14:41:31 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565467#comment-14565467
 ]


Xuefu Zhang commented on HIVE-10283:
------------------------------------

If this is the case, then it's a serious issue. My guess is that the number of 
reducer isn't set correctly. This seems to be a different issue than this JIRA. 
Could you please create a new JIRA. If this also happens to 1.2 release, then 
we need to mark it as a blocker for 1.2.1 release as well.

> HIVE-4240 may be causing issue with bucketed tables 
> ----------------------------------------------------
>
>                 Key: HIVE-10283
>                 URL: https://issues.apache.org/jira/browse/HIVE-10283
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: Ryan P
>
> I suspect that by removing the reducer, HIVE-4240, may be causing issues. 
> Because of this inserts will not consolidate 'buckets' into single files 
> which is problematic when attempting to use bucketmapjoin.
> CREATE TABLE IF NOT EXISTS buckettestinput( 
> data string 
> ) 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; 
> CREATE TABLE IF NOT EXISTS buckettestoutput1( 
> data string 
> )CLUSTERED BY(data) 
> INTO 2 BUCKETS 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; 
> CREATE TABLE IF NOT EXISTS buckettestoutput2( 
> data string 
> )CLUSTERED BY(data) 
> INTO 2 BUCKETS 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; 
> Then I inserted the following data into the "buckettestinput" table 
> firstinsert1 
> firstinsert2 
> firstinsert3 
> firstinsert4 
> firstinsert5 
> firstinsert6 
> firstinsert7 
> firstinsert8 
> secondinsert1 
> secondinsert2 
> secondinsert3 
> secondinsert4 
> secondinsert5 
> secondinsert6 
> secondinsert7 
> secondinsert8 
> set hive.enforce.bucketing = true; 
> set hive.enforce.sorting=true; 
> insert into table buckettestoutput1 
> select * from buckettestinput where data like 'first%' 
> SELECT * 
> FROM buckettestoutput1 TABLESAMPLE(BUCKET 1 OUT OF 1 ON data) s; 
> insert into table buckettestoutput1 
> select * from buckettestinput where data like 'second%' 
> check the results of the table sample query. 
> for sort merge bucket map join 
> set hive.auto.convert.sortmerge.join=true; 
> set hive.optimize.bucketmapjoin = true; 
> set hive.optimize.bucketmapjoin.sortedmerge = true; 
> set hive.auto.convert.sortmerge.join.noconditionaltask=true; 
> select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data) 
> hive> select * from buckettestoutput1 a join buckettestoutput2 b on 
> (a.data=b.data); 
> FAILED: SemanticException [Error 10141]: Bucketed table metadata is not 
> correct. Fix the metadata or don't use bucketed mapjoin, by setting 
> hive.enforce.bucketmapjoin to false. The number of buckets for table 
> buckettestoutput1 is 2, whereas the number of files is 4 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10283) HIVE-4240 may be causing issue with bucketed tables

Reply via email to