[ 
https://issues.apache.org/jira/browse/IMPALA-6972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16470655#comment-16470655
 ] 

ASF subversion and git services commented on IMPALA-6972:
---------------------------------------------------------

Commit b126b2d1053bde6671701af3931c7424a646cd54 in impala's branch 
refs/heads/master from [~joemcdonnell]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=b126b2d ]

IMPALA-6972: Disable parallel dataload on MINICLUSTER_PROFILE=2

There is a Hive bug in Hive 1.1.0 that can result
in a NullPointerException when doing parallel Hive
operations (see IMPALA-6532). Since dataload goes
parallel on Hive loads starting with IMPALA-6372,
dataload can hit this error on Hive 1.1.0 (i.e.
IMPALA_MINICLUSTER_PROFILE=2). This is impacting
builds on the 2.x branch.

This disables parallel dataload for IMPALA_MINICLUSTER_PROFILE=2.

IMPALA_MINICLUSTER_PROFILE=3 uses a newer version
of Hive that has a fix for this, so this continues
to use parallel dataload for that case.

Parallelism can be reenabled when Hive 1.1.0 gets the
fix from Hive 2.1.1.

Change-Id: I90a0f2b3756d7192fa7db2958031b8c88eb606e6
Reviewed-on: http://gerrit.cloudera.org:8080/10306
Reviewed-by: Philip Zeyliger <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Dataload is intermittently failing on 2.x
> -----------------------------------------
>
>                 Key: IMPALA-6972
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6972
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Infrastructure
>    Affects Versions: Impala 2.13.0
>            Reporter: Joe McDonnell
>            Assignee: Joe McDonnell
>            Priority: Blocker
>
> Dataload on IMPALA_MINICLUSTER_PROFILE=2 and the 2.x branch are hitting 
> IMPALA-6532. IMPALA-6532 is a concurrency issue in Hive that can fail with 
> the following stack:
> {noformat}
> java.lang.Exception: java.lang.NullPointerException
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:489)
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:549)
> Caused by: java.lang.NullPointerException
>         at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.initIOContext(HiveContextAwareRecordReader.java:171)
>         at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.initIOContext(HiveContextAwareRecordReader.java:208)
>         at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:258)
>         at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:169)
>         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:438)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745){noformat}
> The Hive issue can be fixed with a backport, but while that is going on, this 
> is only happening during dataload because dataload goes parallel on Hive 
> operations. This is hitting a lot of builds, so temporarily disabling 
> parallelism makes sense.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to