[ 
https://issues.apache.org/jira/browse/HDFS-14789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibin Huang updated HDFS-14789:
--------------------------------
    Description: 
With HDFS-11194 and HDFS-11551, namenode can show SlowPeersReport and 
SlowDisksReport in jmx. I think namenode can avoid these slow node information 
while choosing target in 

we can find slow node through namenode's jmx. So i think namenode should check 
these slow nodes when assigning a node for writing block. If namenode choose a 
node at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault#*chooseRandom()*,
 we should check whether it's belong to slow node, because choosing a slow one 
to write data  may take a long time, which can cause a client writing data very 
slowly and even encounter a socket timeout exception like this:

 
{code:java}
2019-08-19,17:16:41,181 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer 
Exceptionjava.net.SocketTimeoutException: 495000 millis timeout while waiting 
for channel to be ready for write. ch : 
java.nio.channels.SocketChannel[connected local=/xxx:xxx remote=/xxx:xxx] at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159) at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117) at 
java.io.BufferedOutputStream.write(BufferedOutputStream.java:122) at 
java.io.DataOutputStream.write(DataOutputStream.java:107) at 
org.apache.hadoop.hdfs.DFSOutputStream$Packet.writeTo(DFSOutputStream.java:328) 
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:653){code}
 

I use *maxChosenCount* to prevent choosing datanode task too long, which is 
calculated by the logarithm of probability, and it also can guarantee the 
probability of choosing a slow node to write block less than 0.01%.

Finally, i use an expire time to let namnode don't choose these slow nodes 
within a specify period, because these slow nodes may have returned to normal 
after the period and can use to write block again.

  was:
With HDFS-11194 and HDFS-11551, we can find slow node through namenode's jmx. 
So i think namenode should check these slow nodes when assigning a node for 
writing block. If namenode choose a node at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault#*chooseRandom()*,
 we should check whether it's belong to slow node, because choosing a slow one 
to write data  may take a long time, which can cause a client writing data very 
slowly and even encounter a socket timeout exception like this:

 
{code:java}
2019-08-19,17:16:41,181 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer 
Exceptionjava.net.SocketTimeoutException: 495000 millis timeout while waiting 
for channel to be ready for write. ch : 
java.nio.channels.SocketChannel[connected local=/xxx:xxx remote=/xxx:xxx] at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159) at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117) at 
java.io.BufferedOutputStream.write(BufferedOutputStream.java:122) at 
java.io.DataOutputStream.write(DataOutputStream.java:107) at 
org.apache.hadoop.hdfs.DFSOutputStream$Packet.writeTo(DFSOutputStream.java:328) 
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:653){code}
 

I use *maxChosenCount* to prevent choosing datanode task too long, which is 
calculated by the logarithm of probability, and it also can guarantee the 
probability of choosing a slow node to write block less than 0.01%.

Finally, i use an expire time to let namnode don't choose these slow nodes 
within a specify period, because these slow nodes may have returned to normal 
after the period and can use to write block again.


> namenode should avoid slow node when choose target in 
> BlockPlacementPolicyDefault
> ---------------------------------------------------------------------------------
>
>                 Key: HDFS-14789
>                 URL: https://issues.apache.org/jira/browse/HDFS-14789
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Haibin Huang
>            Assignee: Haibin Huang
>            Priority: Major
>         Attachments: HDFS-14789
>
>
> With HDFS-11194 and HDFS-11551, namenode can show SlowPeersReport and 
> SlowDisksReport in jmx. I think namenode can avoid these slow node 
> information while choosing target in 
> we can find slow node through namenode's jmx. So i think namenode should 
> check these slow nodes when assigning a node for writing block. If namenode 
> choose a node at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault#*chooseRandom()*,
>  we should check whether it's belong to slow node, because choosing a slow 
> one to write data  may take a long time, which can cause a client writing 
> data very slowly and even encounter a socket timeout exception like this:
>  
> {code:java}
> 2019-08-19,17:16:41,181 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer 
> Exceptionjava.net.SocketTimeoutException: 495000 millis timeout while waiting 
> for channel to be ready for write. ch : 
> java.nio.channels.SocketChannel[connected local=/xxx:xxx remote=/xxx:xxx] at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) 
> at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159) 
> at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117) 
> at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122) at 
> java.io.DataOutputStream.write(DataOutputStream.java:107) at 
> org.apache.hadoop.hdfs.DFSOutputStream$Packet.writeTo(DFSOutputStream.java:328)
>  at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:653){code}
>  
> I use *maxChosenCount* to prevent choosing datanode task too long, which is 
> calculated by the logarithm of probability, and it also can guarantee the 
> probability of choosing a slow node to write block less than 0.01%.
> Finally, i use an expire time to let namnode don't choose these slow nodes 
> within a specify period, because these slow nodes may have returned to normal 
> after the period and can use to write block again.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to