[
https://issues.apache.org/jira/browse/IMPALA-6972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16470655#comment-16470655
]
ASF subversion and git services commented on IMPALA-6972:
---------------------------------------------------------
Commit b126b2d1053bde6671701af3931c7424a646cd54 in impala's branch
refs/heads/master from [~joemcdonnell]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=b126b2d ]
IMPALA-6972: Disable parallel dataload on MINICLUSTER_PROFILE=2
There is a Hive bug in Hive 1.1.0 that can result
in a NullPointerException when doing parallel Hive
operations (see IMPALA-6532). Since dataload goes
parallel on Hive loads starting with IMPALA-6372,
dataload can hit this error on Hive 1.1.0 (i.e.
IMPALA_MINICLUSTER_PROFILE=2). This is impacting
builds on the 2.x branch.
This disables parallel dataload for IMPALA_MINICLUSTER_PROFILE=2.
IMPALA_MINICLUSTER_PROFILE=3 uses a newer version
of Hive that has a fix for this, so this continues
to use parallel dataload for that case.
Parallelism can be reenabled when Hive 1.1.0 gets the
fix from Hive 2.1.1.
Change-Id: I90a0f2b3756d7192fa7db2958031b8c88eb606e6
Reviewed-on: http://gerrit.cloudera.org:8080/10306
Reviewed-by: Philip Zeyliger <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Dataload is intermittently failing on 2.x
> -----------------------------------------
>
> Key: IMPALA-6972
> URL: https://issues.apache.org/jira/browse/IMPALA-6972
> Project: IMPALA
> Issue Type: Bug
> Components: Infrastructure
> Affects Versions: Impala 2.13.0
> Reporter: Joe McDonnell
> Assignee: Joe McDonnell
> Priority: Blocker
>
> Dataload on IMPALA_MINICLUSTER_PROFILE=2 and the 2.x branch are hitting
> IMPALA-6532. IMPALA-6532 is a concurrency issue in Hive that can fail with
> the following stack:
> {noformat}
> java.lang.Exception: java.lang.NullPointerException
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:489)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:549)
> Caused by: java.lang.NullPointerException
> at
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.initIOContext(HiveContextAwareRecordReader.java:171)
> at
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.initIOContext(HiveContextAwareRecordReader.java:208)
> at
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:258)
> at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:169)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:438)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745){noformat}
> The Hive issue can be fixed with a backport, but while that is going on, this
> is only happening during dataload because dataload goes parallel on Hive
> operations. This is hitting a lot of builds, so temporarily disabling
> parallelism makes sense.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]