[ https://issues.apache.org/jira/browse/MAPREDUCE-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16675154#comment-16675154 ]
Peter Bacsko commented on MAPREDUCE-7156: ----------------------------------------- [~jlowe] could you review this change please? > NullPointerException when reaching max shuffle connections > ---------------------------------------------------------- > > Key: MAPREDUCE-7156 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7156 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 > Affects Versions: 2.9.1, 3.1.1 > Reporter: Peter Bacsko > Assignee: Peter Bacsko > Priority: Major > Attachments: MAPREDUCE-7156-001.patch > > > When you hit the max number of shuffle connections, you can get a lot of > NullPointerExceptions from Netty: > {noformat} > 2018-07-17 10:47:36,311 INFO org.apache.hadoop.mapred.ShuffleHandler: Current > number of shuffle connections (360) is greater than or equal to the max > allowed shuffle connections (360) > 2018-07-17 10:47:36,311 INFO org.apache.hadoop.mapred.ShuffleHandler: Current > number of shuffle connections (360) is greater than or equal to the max > allowed shuffle connections (360) > 2018-07-17 10:47:36,312 INFO org.apache.hadoop.mapred.ShuffleHandler: Current > number of shuffle connections (360) is greater than or equal to the max > allowed shuffle connections (360) > 2018-07-17 10:47:36,316 ERROR org.apache.hadoop.mapred.ShuffleHandler: > Shuffle error: > java.lang.NullPointerException > 2018-07-17 10:47:36,317 ERROR org.apache.hadoop.mapred.ShuffleHandler: > Shuffle error [id: 0x71187405, /10.17.226.11:44330 => /10.17.202.21:13562] > EXCEPTION: java.lang.NullPointerException > 2018-07-17 10:47:36,317 ERROR org.apache.hadoop.mapred.ShuffleHandler: > Shuffle error: > java.lang.NullPointerException > 2018-07-17 10:47:36,317 ERROR org.apache.hadoop.mapred.ShuffleHandler: > Shuffle error [id: 0x71187405, /10.17.226.11:44330 => /10.17.202.21:13562] > EXCEPTION: java.lang.NullPointerException > 2018-07-17 10:47:36,317 ERROR org.apache.hadoop.mapred.ShuffleHandler: > Shuffle error: > java.lang.NullPointerException > 2018-07-17 10:47:36,317 ERROR org.apache.hadoop.mapred.ShuffleHandler: > Shuffle error [id: 0x71187405, /10.17.226.11:44330 => /10.17.202.21:13562] > EXCEPTION: java.lang.NullPointerException > 2018-07-17 10:47:36,329 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > Skipping monitoring container container_e22_1531424278071_55040_01_002295 > since CPU usage is not yet available. > 2018-07-17 10:47:36,340 ERROR org.apache.hadoop.mapred.ShuffleHandler: > Shuffle error: > java.lang.NullPointerException > 2018-07-17 10:47:36,340 ERROR org.apache.hadoop.mapred.ShuffleHandler: > Shuffle error [id: 0xea8afd26, /10.17.202.18:43810 => /10.17.202.21:13562] > EXCEPTION: java.lang.NullPointerException > 2018-07-17 10:47:36,349 ERROR org.apache.hadoop.mapred.ShuffleHandler: > Shuffle error: > java.lang.NullPointerException > 2018-07-17 10:47:36,349 ERROR org.apache.hadoop.mapred.ShuffleHandler: > Shuffle error [id: 0xea8afd26, /10.17.202.18:43810 => /10.17.202.21:13562] > EXCEPTION: java.lang.NullPointerException > 2018-07-17 10:47:36,349 ERROR org.apache.hadoop.mapred.ShuffleHandler: > Shuffle error: > java.lang.NullPointerException > 2018-07-17 10:47:36,349 ERROR org.apache.hadoop.mapred.ShuffleHandler: > Shuffle error [id: 0xea8afd26, /10.17.202.18:43810 => /10.17.202.21:13562] > EXCEPTION: java.lang.NullPointerException > 2018-07-17 10:47:36,361 INFO org.apache.hadoop.mapred.ShuffleHandler: Current > number of shuffle connections (360) is greater than or equal to the max > allowed shuffle connections (360) > 2018-07-17 10:47:36,390 INFO org.apache.hadoop.mapred.ShuffleHandler: Current > number of shuffle connections (360) is greater than or equal to the max > allowed shuffle connections (360) > 2018-07-17 10:47:36,395 ERROR org.apache.hadoop.mapred.ShuffleHandler: > Shuffle error: > {noformat} > {noformat} > 2018-07-17 13:58:28,263 INFO org.apache.hadoop.mapred.ShuffleHandler: Current > number of shuffle connections (360) is greater than or equal to the max > allowed shuffle connections (360) > 2018-07-17 13:58:28,264 ERROR org.apache.hadoop.mapred.ShuffleHandler: > Shuffle error: > java.lang.NullPointerException > at > org.jboss.netty.handler.timeout.IdleStateHandler.writeComplete(IdleStateHandler.java:302) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:73) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.writeComplete(SimpleChannelUpstreamHandler.java:233) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:73) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.writeComplete(SimpleChannelUpstreamHandler.java:233) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:73) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.writeComplete(SimpleChannelUpstreamHandler.java:233) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:73) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.writeComplete(SimpleChannelUpstreamHandler.java:233) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:73) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) > at > org.jboss.netty.channel.Channels.fireWriteComplete(Channels.java:324) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:299) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.writeFromUserCode(AbstractNioWorker.java:146) > at > org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.handleAcceptedSocket(NioServerSocketPipelineSink.java:99) > at > org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.eventSunk(NioServerSocketPipelineSink.java:36) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:779) > at org.jboss.netty.channel.Channels.write(Channels.java:725) > at org.jboss.netty.channel.Channels.write(Channels.java:686) > at > org.jboss.netty.handler.ssl.SslHandler.wrapNonAppData(SslHandler.java:1110) > at org.jboss.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1252) > at org.jboss.netty.handler.ssl.SslHandler.decode(SslHandler.java:852) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:425) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) > at > org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) > at > org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) > at > org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) > at > org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) > at > org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {noformat} > Solutions seems to be an one-liner: you have to call {{super.channelOpen(ctx, > evt);}} in {{Shuffle.channelOpen()}} in both cases. If we don't do this, then > {{IdleStateHandler}} will not be initialized properly and will get a null > attachment object when executing {{writeComplete()}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org