[jira] [Updated] (HIVE-10083) SMBJoin fails in case one table is uninitialized

JIRA Wed, 25 Mar 2015 08:44:49 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-10083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Alain Schröder updated HIVE-10083:
----------------------------------
    Description: 
We experience IndexOutOfBoundsException in a SMBJoin in the case on the tables 
used for the JOIN is uninitialized. Everything works if both are uninitialized 
or initialized.

{code}
2015-03-24 09:12:58,967 ERROR [main]: ql.Driver 
(SessionState.java:printError(545)) - FAILED: IndexOutOfBoundsException Index: 
0, Size: 0
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
        at java.util.ArrayList.rangeCheck(ArrayList.java:635)
        at java.util.ArrayList.get(ArrayList.java:411)
        at 
org.apache.hadoop.hive.ql.optimizer.AbstractBucketJoinProc.fillMappingBigTableBucketFileNameToSmallTableBucketFileNames(AbstractBucketJoinProc.java:486)
        at 
org.apache.hadoop.hive.ql.optimizer.AbstractBucketJoinProc.convertMapJoinToBucketMapJoin(AbstractBucketJoinProc.java:429)
        at 
org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.convertJoinToBucketMapJoin(AbstractSMBJoinProc.java:540)
        at 
org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.convertJoinToSMBJoin(AbstractSMBJoinProc.java:549)
        at 
org.apache.hadoop.hive.ql.optimizer.SortedMergeJoinProc.process(SortedMergeJoinProc.java:51)
[...]
{code}

Simplest way to reproduce:

{code}
SET hive.enforce.sorting=true;
SET hive.enforce.bucketing=true;
SET hive.exec.dynamic.partition=true;
SET mapreduce.reduce.import.limit=-1;

SET hive.optimize.bucketmapjoin=true;
SET hive.optimize.bucketmapjoin.sortedmerge=true;
SET hive.auto.convert.join=true;
SET hive.auto.convert.sortmerge.join=true;
SET hive.auto.convert.sortmerge.join.noconditionaltask=true;

CREATE DATABASE IF NOT EXISTS tmp;
USE tmp;

CREATE  TABLE `test1` (
  `foo` bigint )
CLUSTERED BY (
  foo)
SORTED BY (
  foo ASC)
INTO 384 BUCKETS
stored as orc;

CREATE  TABLE `test2`(
  `foo` bigint )
CLUSTERED BY (
  foo)
SORTED BY (
  foo ASC)
INTO 384 BUCKETS
STORED AS ORC;

-- Initialize ONE table of the two tables with any data.
INSERT INTO TABLE test1 SELECT foo FROM table_with_some_content LIMIT 100;

SELECT t1.foo, t2.foo
FROM test1 t1 INNER JOIN test2 t2 
ON (t1.foo = t2.foo);
{code}

I took a look at the Procedure 
fillMappingBigTableBucketFileNameToSmallTableBucketFileNames in 
AbstractBucketJoinProc.java and it does not seem to have changed from our MapR 
Hive 0.13 to current snapshot, so this should be also an error in the current 
Version.

  was:
We experience IndexOutOfBoundsException in a SMBJoin in the case on the tables 
used for the JOIN is uninitialized. Everything works if both are uninitialized 
or initialized.

{code}
2015-03-24 09:12:58,967 ERROR [main]: ql.Driver 
(SessionState.java:printError(545)) - FAILED: IndexOutOfBoundsException Index: 
0, Size: 0
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
        at java.util.ArrayList.rangeCheck(ArrayList.java:635)
        at java.util.ArrayList.get(ArrayList.java:411)
        at 
org.apache.hadoop.hive.ql.optimizer.AbstractBucketJoinProc.fillMappingBigTableBucketFileNameToSmallTableBucketFileNames(AbstractBucketJoinProc.java:486)
        at 
org.apache.hadoop.hive.ql.optimizer.AbstractBucketJoinProc.convertMapJoinToBucketMapJoin(AbstractBucketJoinProc.java:429)
        at 
org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.convertJoinToBucketMapJoin(AbstractSMBJoinProc.java:540)
        at 
org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.convertJoinToSMBJoin(AbstractSMBJoinProc.java:549)
        at 
org.apache.hadoop.hive.ql.optimizer.SortedMergeJoinProc.process(SortedMergeJoinProc.java:51)
{code}

Simplest way to reproduce:

{code}
SET hive.enforce.sorting=true;
SET hive.enforce.bucketing=true;
SET hive.exec.dynamic.partition=true;
SET mapreduce.reduce.import.limit=-1;

SET hive.optimize.bucketmapjoin=true;
SET hive.optimize.bucketmapjoin.sortedmerge=true;
SET hive.auto.convert.join=true;
SET hive.auto.convert.sortmerge.join=true;
SET hive.auto.convert.sortmerge.join.noconditionaltask=true;

CREATE DATABASE IF NOT EXISTS tmp;
USE tmp;

CREATE  TABLE `test1` (
  `foo` bigint )
CLUSTERED BY (
  foo)
SORTED BY (
  foo ASC)
INTO 384 BUCKETS
stored as orc;

CREATE  TABLE `test2`(
  `foo` bigint )
CLUSTERED BY (
  foo)
SORTED BY (
  foo ASC)
INTO 384 BUCKETS
STORED AS ORC;

-- Initialize ONE table of the two tables with any data.
INSERT INTO TABLE test1 SELECT foo FROM table_with_some_content LIMIT 100;

SELECT t1.foo, t2.foo
FROM test1 t1 INNER JOIN test2 t2 
ON (t1.foo = t2.foo);
{code}

I took a look at the Procedure 
fillMappingBigTableBucketFileNameToSmallTableBucketFileNames in 
AbstractBucketJoinProc.java and it does not seem to have changed from our MapR 
Hive 0.13 to current snapshot, so this should be also an error in the current 
Version.


> SMBJoin fails in case one table is uninitialized
> ------------------------------------------------
>
>                 Key: HIVE-10083
>                 URL: https://issues.apache.org/jira/browse/HIVE-10083
>             Project: Hive
>          Issue Type: Bug
>          Components: Logical Optimizer
>    Affects Versions: 0.13.0
>         Environment: MapR Hive 0.13
>            Reporter: Alain Schröder
>            Priority: Minor
>
> We experience IndexOutOfBoundsException in a SMBJoin in the case on the 
> tables used for the JOIN is uninitialized. Everything works if both are 
> uninitialized or initialized.
> {code}
> 2015-03-24 09:12:58,967 ERROR [main]: ql.Driver 
> (SessionState.java:printError(545)) - FAILED: IndexOutOfBoundsException 
> Index: 0, Size: 0
> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>         at java.util.ArrayList.rangeCheck(ArrayList.java:635)
>         at java.util.ArrayList.get(ArrayList.java:411)
>         at 
> org.apache.hadoop.hive.ql.optimizer.AbstractBucketJoinProc.fillMappingBigTableBucketFileNameToSmallTableBucketFileNames(AbstractBucketJoinProc.java:486)
>         at 
> org.apache.hadoop.hive.ql.optimizer.AbstractBucketJoinProc.convertMapJoinToBucketMapJoin(AbstractBucketJoinProc.java:429)
>         at 
> org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.convertJoinToBucketMapJoin(AbstractSMBJoinProc.java:540)
>         at 
> org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.convertJoinToSMBJoin(AbstractSMBJoinProc.java:549)
>         at 
> org.apache.hadoop.hive.ql.optimizer.SortedMergeJoinProc.process(SortedMergeJoinProc.java:51)
> [...]
> {code}
> Simplest way to reproduce:
> {code}
> SET hive.enforce.sorting=true;
> SET hive.enforce.bucketing=true;
> SET hive.exec.dynamic.partition=true;
> SET mapreduce.reduce.import.limit=-1;
> SET hive.optimize.bucketmapjoin=true;
> SET hive.optimize.bucketmapjoin.sortedmerge=true;
> SET hive.auto.convert.join=true;
> SET hive.auto.convert.sortmerge.join=true;
> SET hive.auto.convert.sortmerge.join.noconditionaltask=true;
> CREATE DATABASE IF NOT EXISTS tmp;
> USE tmp;
> CREATE  TABLE `test1` (
>   `foo` bigint )
> CLUSTERED BY (
>   foo)
> SORTED BY (
>   foo ASC)
> INTO 384 BUCKETS
> stored as orc;
> CREATE  TABLE `test2`(
>   `foo` bigint )
> CLUSTERED BY (
>   foo)
> SORTED BY (
>   foo ASC)
> INTO 384 BUCKETS
> STORED AS ORC;
> -- Initialize ONE table of the two tables with any data.
> INSERT INTO TABLE test1 SELECT foo FROM table_with_some_content LIMIT 100;
> SELECT t1.foo, t2.foo
> FROM test1 t1 INNER JOIN test2 t2 
> ON (t1.foo = t2.foo);
> {code}
> I took a look at the Procedure 
> fillMappingBigTableBucketFileNameToSmallTableBucketFileNames in 
> AbstractBucketJoinProc.java and it does not seem to have changed from our 
> MapR Hive 0.13 to current snapshot, so this should be also an error in the 
> current Version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10083) SMBJoin fails in case one table is uninitialized

Reply via email to