Eugene Koifman created HIVE-14004:
-------------------------------------

             Summary: Minor compaction produces ArrayIndexOutOfBoundsException: 
7 in SchemaEvolution.getFileType
                 Key: HIVE-14004
                 URL: https://issues.apache.org/jira/browse/HIVE-14004
             Project: Hive
          Issue Type: Bug
          Components: Transactions
    Affects Versions: 2.2.0
            Reporter: Eugene Koifman
            Assignee: Eugene Koifman


Easiest way to repro is to add TestTxnCommands2
{noformat}
  @Test
  public void testCompactWithDelete() throws Exception {
    int[][] tableData = {{1,2},{3,4}};
    runStatementOnDriver("insert into " + Table.ACIDTBL + "(a,b) " + 
makeValuesClause(tableData));
    runStatementOnDriver("alter table "+ Table.ACIDTBL + " compact 'MAJOR'");
    Worker t = new Worker();
    t.setThreadId((int) t.getId());
    t.setHiveConf(hiveConf);
    AtomicBoolean stop = new AtomicBoolean();
    AtomicBoolean looped = new AtomicBoolean();
    stop.set(true);
    t.init(stop, looped);
    t.run();
    runStatementOnDriver("delete from " + Table.ACIDTBL + " where b = 4");
    runStatementOnDriver("update " + Table.ACIDTBL + " set b = -2 where b = 2");
    runStatementOnDriver("alter table "+ Table.ACIDTBL + " compact 'MINOR'");
    t.run();
  }
{noformat}
to TestTxnCommands2 and run it.
Test won't fail but if you look 
in target/tmp/log/hive.log for the following exception (from Minor compaction).


{noformat}
2016-06-09T18:36:39,071 WARN  [Thread-190[]]: mapred.LocalJobRunner 
(LocalJobRunner.java:run(560)) - job_local1233973168_0005
java.lang.Exception: java.lang.ArrayIndexOutOfBoundsException: 7
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) 
~[hadoop-mapreduce-client-common-2.6.1.jar:?]
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) 
[hadoop-mapreduce-client-common-2.6.1.jar:?]
Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
        at 
org.apache.orc.impl.SchemaEvolution.getFileType(SchemaEvolution.java:67) 
~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
        at 
org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2031)
 ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
        at 
org.apache.orc.impl.TreeReaderFactory$StructTreeReader.<init>(TreeReaderFactory.java:1716)
 ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
        at 
org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2077)
 ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
        at 
org.apache.orc.impl.TreeReaderFactory$StructTreeReader.<init>(TreeReaderFactory.java:1716)
 ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
        at 
org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2077)
 ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
        at 
org.apache.orc.impl.RecordReaderImpl.<init>(RecordReaderImpl.java:208) 
~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.<init>(RecordReaderImpl.java:63)
 ~[classes/:?]
        at 
org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:365) 
~[classes/:?]
        at 
org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$ReaderPair.<init>(OrcRawRecordMerger.java:207)
 ~[classes/:?]
        at 
org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:508)
 ~[classes/:?]
        at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRawReader(OrcInputFormat.java:1977)
 ~[classes/:?]
        at 
org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:630)
 ~[classes/:?]
        at 
org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:609)
 ~[classes/:?]
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) 
~[hadoop-mapreduce-client-core-2.6.1.jar:?]
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) 
~[hadoop-mapreduce-client-core-2.6.1.jar:?]
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) 
~[hadoop-mapreduce-client-core-2.6.1.jar:?]
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
 ~[hadoop-mapreduce-client-common-2.6.1.jar:?]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
~[?:1.7.0_71]
        at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
~[?:1.7.0_71]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
~[?:1.7.0_71]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
~[?:1.7.0_71]
        at java.lang.Thread.run(Thread.java:745) ~[?:1.7.0_71]
{noformat}

I observed the same on a real cluster.

Based on my observations, running Major compaction instead of minor, works fine.
Replacing the DELETE operation with update, makes both Major/Minor run fine.

The issue itself should be addressed by HIVE-13974 but need to make sure to add 
the test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to