[
https://issues.apache.org/jira/browse/HDFS-839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801701#action_12801701
]
dhruba borthakur commented on HDFS-839:
---------------------------------------
> t seems like the network overhead of having each DN send it's block report to
> each backup node on a large cluster would be higher than a stream from the NN
> to the backup node
I agree. But since this traffic is going straight from the datanode(s) to the
backnode(s), they will be mostly equally distributed among all the datanodes.
The alternative is that the NN has to streamline all block received messages to
the backupnode, this could mean that you need namenode machines with greater
horsepower. If the NN is streaming all blockreceived to one backupnode, it
could still be fine, but if it has to stream it to multiple backup nodes in
parallel that would be quite performance-unsettling. On the other hand, if the
namenode pipelines these blockReceived to a pipeline of backupNodes, then the
namenode has to go through a complex procedure to handle errors (if any) from
the n-th backupnode in the pipleline.
> The NameNode should forward block reports to BackupNode
> -------------------------------------------------------
>
> Key: HDFS-839
> URL: https://issues.apache.org/jira/browse/HDFS-839
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: name-node
> Reporter: dhruba borthakur
> Assignee: dhruba borthakur
>
> The BackupNode (via HADOOP-4539) receives a stream of transactions from
> NameNode. However, the BackupNode does not have block locations of blocks. It
> would be nice if the NameNode can forward all block reports (that it receives
> from DataNodes) to the BackupNode.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.