[ 
https://issues.apache.org/jira/browse/HDFS-839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801701#action_12801701
 ] 

dhruba borthakur commented on HDFS-839:
---------------------------------------

> t seems like the network overhead of having each DN send it's block report to 
> each backup node on a large cluster would be higher than a stream from the NN 
> to the backup node

I agree. But since this traffic is going straight from the datanode(s) to the 
backnode(s), they will be mostly equally distributed among all the datanodes. 
The alternative is that the NN has to streamline all block received messages to 
the backupnode, this could mean that you need namenode machines with greater 
horsepower. If the NN is streaming all blockreceived to one backupnode, it 
could still be fine, but if it has to stream it to multiple backup nodes in 
parallel that would be quite performance-unsettling. On the other hand, if the 
namenode pipelines these blockReceived to a pipeline of backupNodes, then the 
namenode has to go through a complex procedure to  handle errors (if any) from 
the n-th backupnode in the pipleline. 

> The NameNode should forward block reports to BackupNode
> -------------------------------------------------------
>
>                 Key: HDFS-839
>                 URL: https://issues.apache.org/jira/browse/HDFS-839
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: name-node
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>
> The BackupNode (via HADOOP-4539) receives a stream of transactions from 
> NameNode. However, the BackupNode does not have block locations of blocks. It 
> would be nice if the NameNode can forward all block reports (that it receives 
> from DataNodes) to the BackupNode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to