[ 
https://issues.apache.org/jira/browse/IMPALA-6972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell updated IMPALA-6972:
----------------------------------
    Fix Version/s: Impala 2.13.0

> Dataload is intermittently failing on 2.x
> -----------------------------------------
>
>                 Key: IMPALA-6972
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6972
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Infrastructure
>    Affects Versions: Impala 2.13.0
>            Reporter: Joe McDonnell
>            Assignee: Joe McDonnell
>            Priority: Blocker
>             Fix For: Impala 2.13.0
>
>
> Dataload on IMPALA_MINICLUSTER_PROFILE=2 and the 2.x branch are hitting 
> IMPALA-6532. IMPALA-6532 is a concurrency issue in Hive that can fail with 
> the following stack:
> {noformat}
> java.lang.Exception: java.lang.NullPointerException
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:489)
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:549)
> Caused by: java.lang.NullPointerException
>         at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.initIOContext(HiveContextAwareRecordReader.java:171)
>         at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.initIOContext(HiveContextAwareRecordReader.java:208)
>         at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:258)
>         at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:169)
>         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:438)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745){noformat}
> The Hive issue can be fixed with a backport, but while that is going on, this 
> is only happening during dataload because dataload goes parallel on Hive 
> operations. This is hitting a lot of builds, so temporarily disabling 
> parallelism makes sense.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to