[ https://issues.apache.org/jira/browse/HADOOP-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712205#action_12712205 ]
Todd Lipcon commented on HADOOP-5890: ------------------------------------- Woops, I pasted a bad example from the log... here's an example that actually demonstrates the behavior discussed: {code} 2009-05-21 22:43:21,259 INFO datanode.DataNode (DataNode.java:shutdown(637)) - Waiting for threadgroup to exit, active threads is 1 2009-05-21 22:43:21,259 WARN datanode.DataNode (DataXceiverServer.java:run(137)) - DatanodeRegistration(127.0.0.1:40197, storageID=DS-2052133204-127.0.1.1-40197-1242971000238, infoPort=52207, ipcPort=52592):DataXceiveServer: java.nio.channels.AsynchronousCloseException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:185) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:152) at sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:84) at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:130) at java.lang.Thread.run(Thread.java:619) 2009-05-21 22:43:21,315 INFO datanode.DataBlockScanner (DataBlockScanner.java:run(620)) - Exiting DataBlockScanner thread. 2009-05-21 22:43:22,259 INFO datanode.DataNode (DataNode.java:shutdown(637)) - Waiting for threadgroup to exit, active threads is 0 {code} Note the exact 1second offset between 22:43:21,259 and 22:43:22,259. This patch reduces that significantly. > Use exponential backoff on Thread.sleep during DN shutdown > ---------------------------------------------------------- > > Key: HADOOP-5890 > URL: https://issues.apache.org/jira/browse/HADOOP-5890 > Project: Hadoop Core > Issue Type: Improvement > Components: dfs > Reporter: Todd Lipcon > Assignee: Todd Lipcon > Attachments: hadoop-5890.txt > > > Tests waste a lot of time in DataNode.shutdown. Typical logs look like: > {code} > 2009-05-21 17:13:20,177 INFO datanode.DataNode (DataNode.java:shutdown(637)) > - Waiting for threadgroup to exit, active threads is 0 > 2009-05-21 17:13:20,177 INFO datanode.DataBlockScanner > (DataBlockScanner.java:run(620)) - Exiting DataBlockScanner thread. > 2009-05-21 17:13:21,117 INFO datanode.DataNode (DataNode.java:shutdown(637)) > - Waiting for threadgroup to exit, active threads is 0 > {code} > In this example (and very commonly) the DataBlockScanner thread exits within > 5-10ms after the first wait. The DN then sleeps an entire second before > succeeding in shutting down. > Using exponential backoff from a short value like 2ms up to a maximum of > 1000ms would solve this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.