[
https://issues.apache.org/jira/browse/MAPREDUCE-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16450133#comment-16450133
]
Prabhu Joseph commented on MAPREDUCE-7026:
------------------------------------------
Thanks [~jlowe] for the review.
Have added instrumentation code to throw FileNotFoundException. Below is the
complete response the ShuffleHandler has send.
{code}
TTP/1.1 500 Internal Server Error
Content-Type: text/plain; charset=UTF
name: mapreduce
version: 1.0.0
/tmp/file (No such file or directory)
{code}
Below is the sample log output after the patch.
{code}
2018-04-19 12:24:39,510 WARN [fetcher#3]
org.apache.hadoop.mapreduce.task.reduce.Fetcher: Error message from Shuffle
Handler:
name: mapreduce
version: 1.0.0
/tmp/file (No such file or directory)
{code}
> Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler
> ------------------------------------------------------------------------------
>
> Key: MAPREDUCE-7026
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7026
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: task
> Affects Versions: 2.7.3
> Reporter: Prabhu Joseph
> Assignee: Prabhu Joseph
> Priority: Major
> Labels: supportability
> Attachments: MAPREDUCE-7026.1.patch, MAPREDUCE-7026.2.patch
>
>
> A job is failing with reduce tasks failed to fetch map output and the
> NodeManager ShuffleHandler failed to serve the map outputs with some
> IOException like below. ShuffleHandler sends the actual error message in
> response inside sendError() but the Fetcher does not log this message.
> Logs from NodeManager ShuffleHandler:
> {code}
> 2017-12-18 10:10:30,728 ERROR mapred.ShuffleHandler
> (ShuffleHandler.java:messageReceived(962)) - Shuffle error in populating
> headers :
> java.io.IOException: Error Reading IndexFile
> at
> org.apache.hadoop.mapred.ShuffleHandler$Shuffle.populateHeaders(ShuffleHandler.java:1089)
> at
> org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:958)
> at
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
> at
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
> at
> org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142)
> at
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
> at
> org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:148)
> at
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
> at
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
> at
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
> at
> org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
> at
> org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
> at
> org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
> at
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
> at
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555)
> at
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
> at
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
> at
> org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
> at
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:107)
> at
> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
> at
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88)
> at
> org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
> at
> org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
> at
> org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Owner 'hbase' for path
> /grid/7/hadoop/yarn/local/usercache/bde/appcache/application_1512457770852_9447/output/attempt_1512457770852_9447_1_01_000007_0_10004/file.out.index
> did not match expected owner 'bde'
> at
> org.apache.hadoop.io.SecureIOUtils.checkStat(SecureIOUtils.java:285)
> at
> org.apache.hadoop.io.SecureIOUtils.forceSecureOpenFSDataInputStream(SecureIOUtils.java:174)
> at
> org.apache.hadoop.io.SecureIOUtils.openFSDataInputStream(SecureIOUtils.java:158)
> at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:70)
> at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:62)
> at
> org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:119)
> {code}
> Fetcher Logs below without the actual error message:
> {code}
> 2018-04-19 12:24:39,511 WARN [fetcher#3]
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: Invalid map id
> java.lang.IllegalArgumentException: TaskAttemptId string : TTP/1.1 500
> Internal Server Error
> Content-Type: text/plain; charset=UTF is not properly formed
> at
> org.apache.hadoop.mapreduce.TaskAttemptID.forName(TaskAttemptID.java:201)
> at
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:517)
> at
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:345)
> at
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:200)
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]