Joe McDonnell created IMPALA-6972:
-------------------------------------

             Summary: Dataload is intermittently failing on 2.x
                 Key: IMPALA-6972
                 URL: https://issues.apache.org/jira/browse/IMPALA-6972
             Project: IMPALA
          Issue Type: Bug
          Components: Infrastructure
    Affects Versions: Impala 2.13.0
            Reporter: Joe McDonnell
            Assignee: Joe McDonnell


Dataload on IMPALA_MINICLUSTER_PROFILE=2 and the 2.x branch are hitting 
IMPALA-6532. IMPALA-6532 is a concurrency issue in Hive that can fail with the 
following stack:
{noformat}
java.lang.Exception: java.lang.NullPointerException
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:489)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:549)
Caused by: java.lang.NullPointerException
        at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.initIOContext(HiveContextAwareRecordReader.java:171)
        at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.initIOContext(HiveContextAwareRecordReader.java:208)
        at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:258)
        at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:169)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:438)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745){noformat}
The Hive issue can be fixed with a backport, but while that is going on, this 
is only happening during dataload because dataload goes parallel on Hive 
operations. This is hitting a lot of builds, so temporarily disabling 
parallelism makes sense.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to