Oleksiy Sayankin created TEZ-3074: ------------------------------------- Summary: Multithreading issue java.lang.ArrayIndexOutOfBoundsException: -1 while working with Tez Key: TEZ-3074 URL: https://issues.apache.org/jira/browse/TEZ-3074 Project: Apache Tez Issue Type: Bug Affects Versions: 0.5.3 Reporter: Oleksiy Sayankin Fix For: 0.5.3
*STEP 1. Install and configure Tez on yarn* *STEP 2. Configure hive for tez* *STEP 3. Create test tables in Hive and fill it with data* Enable dynamic partitioning in Hive. Add to {{hive-site.xml}} and restart Hive. {code:xml} <!-- DYNAMIC PARTITION --> <property> <name>hive.exec.dynamic.partition</name> <value>true</value> </property> <property> <name>hive.exec.dynamic.partition.mode</name> <value>nonstrict</value> </property> <property> <name>hive.exec.max.dynamic.partitions.pernode</name> <value>2000</value> </property> <property> <name>hive.exec.max.dynamic.partitions</name> <value>2000</value> </property> {code} Execute in command line {code} hadoop fs -put tempsource.data / {code} Execute in command line. Use attached file {{tempsource.data}} {code} hive> CREATE TABLE test3 (x INT, y STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; hive> CREATE TABLE ptest1 (x INT, y STRING) PARTITIONED BY (z STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; hive> CREATE TABLE tempsource (x INT, y STRING, z STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; hive> LOAD DATA INPATH '/tempsource.data' OVERWRITE INTO TABLE tempsource; hive> INSERT OVERWRITE TABLE ptest1 PARTITION (z) SELECT x,y,z FROM tempsource; {code} *STEP 4. Mount NFS on cluster* *STEP 5. Run teragen test application* Use separate console {code} /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.5.1.jar teragen -Dmapred.map.tasks=7 -Dmapreduce.map.disk=0 -Dmapreduce.map.cpu.vcores=0 1000000000 /user/hdfs/input {code} *STEP 6. Create many test files* Use separate console {code} cd /hdfs/cluster/user/hive/warehouse/ptest1/z=66 for i in `seq 1 10000`; do dd if=/dev/urandom of=tempfile$i bs=1M count=1; done {code} *STEP 7. Run the following query repeatedly in other console* Use separate console {code} hive> insert overwrite table test3 select x,y from ( select x,y,z from (select x,y,z from ptest1 where x > 5 and x < 1000 union all select x,y,z from ptest1 where x > 5 and x < 1000) a)b; {code} After some time of working it gives an exception. {noformat} Status: Failed Vertex failed, vertexName=Map 3, vertexId=vertex_1443452487059_0426_1_01, diagnostics=[Vertex vertex_1443452487059_0426_1_01 [Map 3] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: ptest1 initializer failed, vertex=vertex_1443452487059_0426_1_01 [Map 3], java.lang.ArrayIndexOutOfBoundsException: -1 at org.apache.hadoop.mapred.FileInputFormat.getBlockIndex(FileInputFormat.java:395) at org.apache.hadoop.mapred.FileInputFormat.getSplitHostsAndCachedHosts(FileInputFormat.java:579) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:359) at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:300) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:402) at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:132) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ] Vertex killed, vertexName=Map 1, vertexId=vertex_1443452487059_0426_1_00, diagnostics=[Vertex received Kill in INITED state., Vertex vertex_1443452487059_0426_1_00 [Map 1] killed/failed due to:null] DAG failed due to vertex failure. failedVertices:1 killedVertices:1 FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)