Alain Schröder created HIVE-10083: ------------------------------------- Summary: SMBJoin fails in case one table is uninitialized Key: HIVE-10083 URL: https://issues.apache.org/jira/browse/HIVE-10083 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 0.13.1 Environment: MapR Hive 0.13 Reporter: Alain Schröder Priority: Minor
We experience IndexOutOfBoundsException in a SMBJoin in the case on the tables used for the JOIN is uninitialized. Everything works if both are uninitialized or initialized. {code} 2015-03-24 09:12:58,967 ERROR [main]: ql.Driver (SessionState.java:printError(545)) - FAILED: IndexOutOfBoundsException Index: 0, Size: 0 java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.hadoop.hive.ql.optimizer.AbstractBucketJoinProc.fillMappingBigTableBucketFileNameToSmallTableBucketFileNames(AbstractBucketJoinProc.java:486) at org.apache.hadoop.hive.ql.optimizer.AbstractBucketJoinProc.convertMapJoinToBucketMapJoin(AbstractBucketJoinProc.java:429) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.convertJoinToBucketMapJoin(AbstractSMBJoinProc.java:540) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.convertJoinToSMBJoin(AbstractSMBJoinProc.java:549) at org.apache.hadoop.hive.ql.optimizer.SortedMergeJoinProc.process(SortedMergeJoinProc.java:51) {code} Simplest way to reproduce: {code} SET hive.enforce.sorting=true; SET hive.enforce.bucketing=true; SET hive.exec.dynamic.partition=true; SET mapreduce.reduce.import.limit=-1; SET hive.optimize.bucketmapjoin=true; SET hive.optimize.bucketmapjoin.sortedmerge=true; SET hive.auto.convert.join=true; SET hive.auto.convert.sortmerge.join=true; SET hive.auto.convert.sortmerge.join.noconditionaltask=true; CREATE DATABASE IF NOT EXISTS tmp; USE tmp; CREATE TABLE `test1` ( `foo` bigint ) CLUSTERED BY ( foo) SORTED BY ( foo ASC) INTO 384 BUCKETS stored as orc; CREATE TABLE `test2`( `foo` bigint ) CLUSTERED BY ( foo) SORTED BY ( foo ASC) INTO 384 BUCKETS STORED AS ORC; -- Initialize ONE table of the two tables with any data. INSERT INTO TABLE test1 SELECT foo FROM table_with_some_content LIMIT 100; SELECT t1.foo, t2.foo FROM test1 t1 INNER JOIN test2 t2 ON (t1.foo = t2.foo); {code} I took a look at the Procedure fillMappingBigTableBucketFileNameToSmallTableBucketFileNames in AbstractBucketJoinProc.java and it does not seem to have changed from our MapR Hive 0.13 to current snapshot, so this should be also an error in the current Version. -- This message was sent by Atlassian JIRA (v6.3.4#6332)