[
https://issues.apache.org/jira/browse/TEZ-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rajesh Balamohan updated TEZ-1543:
----------------------------------
Fix Version/s: 0.6.0
> Shuffle Errors on heavy load (causing task retries)
> ---------------------------------------------------
>
> Key: TEZ-1543
> URL: https://issues.apache.org/jira/browse/TEZ-1543
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Rajesh Balamohan
> Assignee: Rajesh Balamohan
> Labels: performance
> Fix For: 0.6.0
>
> Attachments: TEZ-1543.1.patch, syn_app_with_issue.svg, with_patch.svg
>
>
> org.apache.tez.runtime.library.common.shuffle.impl.Shuffle: ShuffleRunner
> failed with error
> org.apache.tez.runtime.library.common.shuffle.impl.Shuffle$ShuffleError:
> error in shuffle in fetcher [initialmap] #13
> at
> org.apache.tez.runtime.library.common.shuffle.impl.Shuffle$RunShuffleCallable.call(Shuffle.java:336)
> at
> org.apache.tez.runtime.library.common.shuffle.impl.Shuffle$RunShuffleCallable.call(Shuffle.java:318)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:722)
> Caused by: java.lang.NullPointerException
> at
> org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
> at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329)
> at
> org.apache.hadoop.io.WritableUtils.readStringSafely(WritableUtils.java:475)
> at
> org.apache.tez.runtime.library.common.shuffle.impl.ShuffleHeader.readFields(ShuffleHeader.java:82)
> at
> org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.copyMapOutput(Fetcher.java:350)
> at
> org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.copyFromHost(Fetcher.java:294)
> at
> org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.run(Fetcher.java:160)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)