[ 
https://issues.apache.org/jira/browse/HDFS-14789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibin Huang updated HDFS-14789:
--------------------------------
    Description: 
With HDFS-11194 and HDFS-11551, namenode can show SlowPeersReport and 
SlowDisksReport in jmx. I think namenode can avoid these slow node while 
chooseTarget in BlockPlacementPolicyDefault. Because if there is a slow node in 
pipeline, client might write very slowly. 

I use a invalidityTime to let namnode not choose slow node before invalid time 
finish. After the invalidityTime, if slow node return to normal, namenode can 
choose it again, or it's still very slow, the invalidityTime will update and 
keep not choosing it.

Also i consider the fallback, if namenode can't choose any normal node, 
chooseTarget will throw NotEnoughReplicasException and retry, this time not 
avoiding slow nodes.

 

 

  was:
With HDFS-11194 and HDFS-11551, namenode can show SlowPeersReport and 
SlowDisksReport in jmx. I think namenode can avoid these slow node information 
while choosing target in 

we can find slow node through namenode's jmx. So i think namenode should check 
these slow nodes when assigning a node for writing block. If namenode choose a 
node at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault#*chooseRandom()*,
 we should check whether it's belong to slow node, because choosing a slow one 
to write data  may take a long time, which can cause a client writing data very 
slowly and even encounter a socket timeout exception like this:

 
{code:java}
2019-08-19,17:16:41,181 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer 
Exceptionjava.net.SocketTimeoutException: 495000 millis timeout while waiting 
for channel to be ready for write. ch : 
java.nio.channels.SocketChannel[connected local=/xxx:xxx remote=/xxx:xxx] at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159) at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117) at 
java.io.BufferedOutputStream.write(BufferedOutputStream.java:122) at 
java.io.DataOutputStream.write(DataOutputStream.java:107) at 
org.apache.hadoop.hdfs.DFSOutputStream$Packet.writeTo(DFSOutputStream.java:328) 
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:653){code}
 

I use *maxChosenCount* to prevent choosing datanode task too long, which is 
calculated by the logarithm of probability, and it also can guarantee the 
probability of choosing a slow node to write block less than 0.01%.

Finally, i use an expire time to let namnode don't choose these slow nodes 
within a specify period, because these slow nodes may have returned to normal 
after the period and can use to write block again.


> namenode should avoid slow node when choose target in 
> BlockPlacementPolicyDefault
> ---------------------------------------------------------------------------------
>
>                 Key: HDFS-14789
>                 URL: https://issues.apache.org/jira/browse/HDFS-14789
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Haibin Huang
>            Assignee: Haibin Huang
>            Priority: Major
>         Attachments: HDFS-14789
>
>
> With HDFS-11194 and HDFS-11551, namenode can show SlowPeersReport and 
> SlowDisksReport in jmx. I think namenode can avoid these slow node while 
> chooseTarget in BlockPlacementPolicyDefault. Because if there is a slow node 
> in pipeline, client might write very slowly. 
> I use a invalidityTime to let namnode not choose slow node before invalid 
> time finish. After the invalidityTime, if slow node return to normal, 
> namenode can choose it again, or it's still very slow, the invalidityTime 
> will update and keep not choosing it.
> Also i consider the fallback, if namenode can't choose any normal node, 
> chooseTarget will throw NotEnoughReplicasException and retry, this time not 
> avoiding slow nodes.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to