[ https://issues.apache.org/jira/browse/HDFS-14789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
HaiBin Huang updated HDFS-14789: -------------------------------- Description: With HDFS-11194 and HDFS-11551, we can find slow node through namenode's jmx. So i think namenode should check these slow nodes when assigning a node for writing block. If namenode choose a node at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault#*chooseRandom()*, we should check whether it's belong to slow node, because choosing a slow one to write data may take a long time, which can cause a client writing data very slowly and even encounter a socket timeout exception like this: {code:java} 2019-08-19,17:16:41,181 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exceptionjava.net.SocketTimeoutException: 495000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/xxx:xxx remote=/xxx:xxx] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122) at java.io.DataOutputStream.write(DataOutputStream.java:107) at org.apache.hadoop.hdfs.DFSOutputStream$Packet.writeTo(DFSOutputStream.java:328) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:653){code} I use *maxChosenCount* to prevent choosing datanode task too long, which is calculated by the logarithm of probability, and it also can guarantee the probability of choosing a slow node to write block less than 0.01%. Finally, i use an expire time to let namnode don't choose these slow nodes within a specify period, because these slow nodes may have returned to normal after the period and can use to write block again. was:With[ HDFS-11194|https://issues.apache.org/jira/browse/HDFS-11194] and [HDFS-11551|https://issues.apache.org/jira/browse/HDFS-11551], we can find slow node through namenode's jmx. So i think namenode should check these slow node when > namenode should check slow node when assigning a node for writing block > ------------------------------------------------------------------------ > > Key: HDFS-14789 > URL: https://issues.apache.org/jira/browse/HDFS-14789 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: HaiBin Huang > Assignee: HaiBin Huang > Priority: Major > > With HDFS-11194 and HDFS-11551, we can find slow node through namenode's jmx. > So i think namenode should check these slow nodes when assigning a node for > writing block. If namenode choose a node at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault#*chooseRandom()*, > we should check whether it's belong to slow node, because choosing a slow > one to write data may take a long time, which can cause a client writing > data very slowly and even encounter a socket timeout exception like this: > > {code:java} > 2019-08-19,17:16:41,181 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer > Exceptionjava.net.SocketTimeoutException: 495000 millis timeout while waiting > for channel to be ready for write. ch : > java.nio.channels.SocketChannel[connected local=/xxx:xxx remote=/xxx:xxx] at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117) > at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122) at > java.io.DataOutputStream.write(DataOutputStream.java:107) at > org.apache.hadoop.hdfs.DFSOutputStream$Packet.writeTo(DFSOutputStream.java:328) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:653){code} > > I use *maxChosenCount* to prevent choosing datanode task too long, which is > calculated by the logarithm of probability, and it also can guarantee the > probability of choosing a slow node to write block less than 0.01%. > Finally, i use an expire time to let namnode don't choose these slow nodes > within a specify period, because these slow nodes may have returned to normal > after the period and can use to write block again. -- This message was sent by Atlassian Jira (v8.3.2#803003) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org