[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16677036#comment-16677036
 ] 

Jason Lowe commented on MAPREDUCE-7156:
---------------------------------------

Thanks for updating the patch!  Weird, the unit tests ran zero tests then 
failed.  It looks like the surefire JVM died somehow before it ran any tests.  
Kicked off another precommit run on this to see if it was a hiccup or not.

> NullPointerException when reaching max shuffle connections
> ----------------------------------------------------------
>
>                 Key: MAPREDUCE-7156
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7156
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 2.9.1, 3.1.1
>            Reporter: Peter Bacsko
>            Assignee: Peter Bacsko
>            Priority: Major
>         Attachments: MAPREDUCE-7156-001.patch, MAPREDUCE-7156-002.patch
>
>
>  When you hit the max number of shuffle connections, you can get a lot of 
> NullPointerExceptions from Netty:
> {noformat}
> 2018-07-17 10:47:36,311 INFO org.apache.hadoop.mapred.ShuffleHandler: Current 
> number of shuffle connections (360) is greater than or equal to the max 
> allowed shuffle connections (360)
> 2018-07-17 10:47:36,311 INFO org.apache.hadoop.mapred.ShuffleHandler: Current 
> number of shuffle connections (360) is greater than or equal to the max 
> allowed shuffle connections (360)
> 2018-07-17 10:47:36,312 INFO org.apache.hadoop.mapred.ShuffleHandler: Current 
> number of shuffle connections (360) is greater than or equal to the max 
> allowed shuffle connections (360)
> 2018-07-17 10:47:36,316 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error:
> java.lang.NullPointerException
> 2018-07-17 10:47:36,317 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error [id: 0x71187405, /10.17.226.11:44330 => /10.17.202.21:13562] 
> EXCEPTION: java.lang.NullPointerException
> 2018-07-17 10:47:36,317 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error:
> java.lang.NullPointerException
> 2018-07-17 10:47:36,317 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error [id: 0x71187405, /10.17.226.11:44330 => /10.17.202.21:13562] 
> EXCEPTION: java.lang.NullPointerException
> 2018-07-17 10:47:36,317 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error:
> java.lang.NullPointerException
> 2018-07-17 10:47:36,317 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error [id: 0x71187405, /10.17.226.11:44330 => /10.17.202.21:13562] 
> EXCEPTION: java.lang.NullPointerException
> 2018-07-17 10:47:36,329 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Skipping monitoring container container_e22_1531424278071_55040_01_002295 
> since CPU usage is not yet available.
> 2018-07-17 10:47:36,340 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error:
> java.lang.NullPointerException
> 2018-07-17 10:47:36,340 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error [id: 0xea8afd26, /10.17.202.18:43810 => /10.17.202.21:13562] 
> EXCEPTION: java.lang.NullPointerException
> 2018-07-17 10:47:36,349 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error:
> java.lang.NullPointerException
> 2018-07-17 10:47:36,349 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error [id: 0xea8afd26, /10.17.202.18:43810 => /10.17.202.21:13562] 
> EXCEPTION: java.lang.NullPointerException
> 2018-07-17 10:47:36,349 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error:
> java.lang.NullPointerException
> 2018-07-17 10:47:36,349 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error [id: 0xea8afd26, /10.17.202.18:43810 => /10.17.202.21:13562] 
> EXCEPTION: java.lang.NullPointerException
> 2018-07-17 10:47:36,361 INFO org.apache.hadoop.mapred.ShuffleHandler: Current 
> number of shuffle connections (360) is greater than or equal to the max 
> allowed shuffle connections (360)
> 2018-07-17 10:47:36,390 INFO org.apache.hadoop.mapred.ShuffleHandler: Current 
> number of shuffle connections (360) is greater than or equal to the max 
> allowed shuffle connections (360)
> 2018-07-17 10:47:36,395 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error:
> {noformat}
> {noformat}
> 2018-07-17 13:58:28,263 INFO org.apache.hadoop.mapred.ShuffleHandler: Current 
> number of shuffle connections (360) is greater than or equal to the max 
> allowed shuffle connections (360)
> 2018-07-17 13:58:28,264 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error:
> java.lang.NullPointerException
>         at 
> org.jboss.netty.handler.timeout.IdleStateHandler.writeComplete(IdleStateHandler.java:302)
>         at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:73)
>         at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>         at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>         at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.writeComplete(SimpleChannelUpstreamHandler.java:233)
>         at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:73)
>         at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>         at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>         at 
> org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142)
>         at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>         at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>         at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.writeComplete(SimpleChannelUpstreamHandler.java:233)
>         at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:73)
>         at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>         at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>         at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.writeComplete(SimpleChannelUpstreamHandler.java:233)
>         at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:73)
>         at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>         at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>         at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.writeComplete(SimpleChannelUpstreamHandler.java:233)
>         at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:73)
>         at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>         at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
>         at 
> org.jboss.netty.channel.Channels.fireWriteComplete(Channels.java:324)
>         at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:299)
>         at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.writeFromUserCode(AbstractNioWorker.java:146)
>         at 
> org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.handleAcceptedSocket(NioServerSocketPipelineSink.java:99)
>         at 
> org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.eventSunk(NioServerSocketPipelineSink.java:36)
>         at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:779)
>         at org.jboss.netty.channel.Channels.write(Channels.java:725)
>         at org.jboss.netty.channel.Channels.write(Channels.java:686)
>         at 
> org.jboss.netty.handler.ssl.SslHandler.wrapNonAppData(SslHandler.java:1110)
>         at org.jboss.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1252)
>         at org.jboss.netty.handler.ssl.SslHandler.decode(SslHandler.java:852)
>         at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:425)
>         at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
>         at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
>         at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>         at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
>         at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
>         at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
>         at 
> org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
>         at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
>         at 
> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
>         at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
>         at 
> org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
>         at 
> org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>         at 
> org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> {noformat}
> Solutions seems to be an one-liner: you have to call {{super.channelOpen(ctx, 
> evt);}} in {{Shuffle.channelOpen()}} in both cases. If we don't do this, then 
> {{IdleStateHandler}} will not be initialized properly and will get a null 
> attachment object when executing {{writeComplete()}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to