[ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588663#comment-14588663
 ] 

Colin Patrick McCabe commented on HDFS-7923:
--------------------------------------------

Starvation is not a real concern here.  Imagine a 1000 node cluster where full 
block reports are 6 hours apart.  Then the NN needs to be able to handle 1.6 
full block reports a minute.  If each one takes 500 ms (we'll be pessimistic), 
then 0.5 out of every 90 seconds is FBR time, or 0.5% of the time.  If you want 
to be even more pessimistic and assume each block report is 1 hour apart rather 
than 6, just multiple that number by 6 to get 3% of the time.

For starvation to happen, you'd have to be spending close to 100% of the time 
on full block reports.  That's just not going to happen.  And if it does 
happen, you have bigger problems, like not being able to actually do anything 
on the NameNode (since you're spending all your time on FBRs, which hold the 
FSN write lock).

Even if you were spending close to 100% of the time on full block reports, the 
existing code doesn't enforce fairness... I can configure one DN to send full 
block reports every 30 minutes, and configure everyone else to send every 10 
hours.  The FBR period is a datanode-side configuration, not a NN-side one.

This change is really helpful during startup on big clusters.  In the past we 
have seen restarting all the DNs at once on a several hundred node cluster 
bring the NN to its knees.  All of the RPC handlers get flooded with FBRs, but 
only one can make progress at once.  The flood of FBRs also triggers full GCs, 
since we can't handle them in a timely fashion and they enter the oldgen.  I 
realize that {{dfs.blockreport.initialDelay}} was designed as a workaround, but 
it is difficult to know what value to set it to, results in slower startup, and 
is often overlooked in real-world deployments.

If we want to work on enforcing fairness on the NN-side, we can do that, but it 
seems unrelated to this change to me.  It's also not something we currently do, 
so it would be nice to see data showing that it was helpful.

> The DataNodes should rate-limit their full block reports by asking the NN on 
> heartbeat messages
> -----------------------------------------------------------------------------------------------
>
>                 Key: HDFS-7923
>                 URL: https://issues.apache.org/jira/browse/HDFS-7923
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: 2.8.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>             Fix For: 2.8.0
>
>         Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, 
> HDFS-7923.002.patch, HDFS-7923.003.patch, HDFS-7923.004.patch, 
> HDFS-7923.006.patch, HDFS-7923.007.patch
>
>
> The DataNodes should rate-limit their full block reports.  They can do this 
> by first sending a heartbeat message to the NN with an optional boolean set 
> which requests permission to send a full block report.  If the NN responds 
> with another optional boolean set, the DN will send an FBR... if not, it will 
> wait until later.  This can be done compatibly with optional fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to