[ https://issues.apache.org/jira/browse/HADOOP-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467867 ]
dhruba borthakur commented on HADOOP-923: ----------------------------------------- Introduce a new DatanodeProtocol call named sendBlockModifications(). The namenode returns the blocks that are to be replicated or deleted as part of this call. The existing method sendHeartbeat() just updates the heartbeat array in the namenode, it does not send back the list of blocks that are pending replication or the blocks that are to be deleted. The Datanode invokes the sendHeartbeat RPC once every 3 seconds. The Datanode invokes the sendBlockModifications RPC once every 10 heartbeats. The namenode acquires only the heartbeat lock while processing the sendHeartbeat call. The namenode acquires the global FSnamesystem lock while processing the sendBlockModifications call. The above change ensures that heartbeats processing time does not depend on the amount of blocks that are pending to be replicated. > DFS Scalability: datanode heartbeat timeouts cause cascading timeouts of > other datanodes > ---------------------------------------------------------------------------------------- > > Key: HADOOP-923 > URL: https://issues.apache.org/jira/browse/HADOOP-923 > Project: Hadoop > Issue Type: Bug > Components: dfs > Affects Versions: 0.10.1 > Reporter: dhruba borthakur > Assigned To: dhruba borthakur > > The datanode sends a heartbeat to the namenode every 3 seconds. The namenode > processes the heartbeat and sends a list of block-to-be-replicated and > blocks-to-be-deleted as part of the heartbeat response. > At times when a couple of datanodes fail, the heartbeat processing on the > namenode becomes pretty heavyweight. It acquires the global FSNamesystem > lock, traverses the neededReplication structure, generates a list of blocks > to be replicated and responds to the heartbeat message. Determining the list > of blocks-to-be-replciated is pretty heavyweight, takes plenty of CPU and > blocks processing of other heartbeats because of the global FSNamesystem lock. > It would improve scalability a lot if heartbeat processing does not require > the FSNamesystem lock. In fact, the pre-existing "heartbeat" lock already > exists for this purpose. > I propose that the Heartbeat message be separate from the "retrieve > blocks-to-replicate and blocks-to-delete" messages. The datanode can continue > to heartbeat once every 3 seconds while it can afford to "retrieve > blocks-to-replicate" at a much coarser interval. Heartbeat processing on the > namenode will be fast because it does not require the global FSNamesystem > lock. Moreover, a datanode failure will not aggrevate the heartbeat > processing time on the namenode. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.