[ https://issues.apache.org/jira/browse/MAPREDUCE-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated MAPREDUCE-7026: -------------------------------------- Labels: pull-request-available supportability (was: supportability) > Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler > ------------------------------------------------------------------------------ > > Key: MAPREDUCE-7026 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7026 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task > Affects Versions: 2.7.3 > Reporter: Prabhu Joseph > Assignee: Ashutosh Gupta > Priority: Major > Labels: pull-request-available, supportability > Attachments: MAPREDUCE-7026.1.patch, MAPREDUCE-7026.2.patch, > MAPREDUCE-7026.3.patch > > > A job is failing with reduce tasks failed to fetch map output and the > NodeManager ShuffleHandler failed to serve the map outputs with some > IOException like below. ShuffleHandler sends the actual error message in > response inside sendError() but the Fetcher does not log this message. > Logs from NodeManager ShuffleHandler: > {code} > 2017-12-18 10:10:30,728 ERROR mapred.ShuffleHandler > (ShuffleHandler.java:messageReceived(962)) - Shuffle error in populating > headers : > java.io.IOException: Error Reading IndexFile > at > org.apache.hadoop.mapred.ShuffleHandler$Shuffle.populateHeaders(ShuffleHandler.java:1089) > at > org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:958) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:148) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459) > at > org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536) > at > org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) > at > org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:107) > at > org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88) > at > org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) > at > org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) > at > org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Owner 'hbase' for path > /grid/7/hadoop/yarn/local/usercache/bde/appcache/application_1512457770852_9447/output/attempt_1512457770852_9447_1_01_000007_0_10004/file.out.index > did not match expected owner 'bde' > at > org.apache.hadoop.io.SecureIOUtils.checkStat(SecureIOUtils.java:285) > at > org.apache.hadoop.io.SecureIOUtils.forceSecureOpenFSDataInputStream(SecureIOUtils.java:174) > at > org.apache.hadoop.io.SecureIOUtils.openFSDataInputStream(SecureIOUtils.java:158) > at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:70) > at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:62) > at > org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:119) > {code} > Fetcher Logs below without the actual error message: > {code} > 2018-04-19 12:24:39,521 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in fetcher#3 > at > org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164) > Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; > bailing-out. > at > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkReducerHealth(ShuffleSchedulerImpl.java:366) > at > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:288) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:369) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:200) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org