[ https://issues.apache.org/jira/browse/HDFS-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036410#comment-13036410 ]
Matt Foley commented on HDFS-1961: ---------------------------------- Good start. A few suggestions: Section 4.4: Suggest start with: "All communication between Namenode and Datanode is initiated by the Datanode, and responded to by the Namenode. The Namenode never initiates communication to the Datanode, although Namenode responses may include commands to the Datanode that cause it to send further communications." 4.4.2 "DataNode Command – send heartbeat." Suggest change to "DataNode sends Heartbeat." 4.4.3 "DataNodeCommand – block report." Suggest change to "DataNode sends BlockReport." 4.4.4 "BlockReceived." Suggest change to "DataNode notifies BlockReceived." Section 5.2: In the list of NN "threads", calling the first one "HeartBeat" is a little confusing. Please consider calling it something like "Datanode Health Management", instead. In the code it is called "HeartbeatMonitor", but its job is neither sending nor receiving heartbeats, but rather to periodically check to make sure that every Datanode has sent a heartbeat at least once in the last 10 minutes (or as configured). Should probably also mention the bundle of threads that provide the Namenode's RPC service, which receives and processes all 13 kinds of communication from Datanodes and Clients. Section 5.3: "This [blockReceived notification] may prevent NameNode temporarily from asking for a full block report since the receipt of a blockReceived() message indicates that the DataNode is still alive." That sentence isn't correct, since it is relatively unusual for the NN to ask the DN for a block report. (It only happens when recovering from gross errors.) Instead, suggest including in this section a brief discussion of the fact that the DN sends a heartbeat to the NN every 3 seconds (or as configured), which allows the NN a chance to respond with commands such as * "delete replica" if a block has become over-replicated, or * "copy replica to this other DN" if a block needs further replication. And the DN initiates a BlockReport to the NN every hour (or as configured), which prevents any divergence in the NN and DN belief about which replicas are held by each datanode. And yes, it also sends an immediate blockReceived notification whenever it receives a new block, whether from a Client (file create/append), or from another Datanode (block replication). "A blockReport() is also issued periodically as a portion of the HeartBeat." Not exactly. The DN's "heartbeat thread" takes care of sending both, at the appropriate time intervals, but they are separate RPCs to the NN. > New architectural documentation created > --------------------------------------- > > Key: HDFS-1961 > URL: https://issues.apache.org/jira/browse/HDFS-1961 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation > Affects Versions: 0.21.0 > Reporter: Rick Kazman > Labels: architecture, hadoop, newbie > Fix For: 0.21.0 > > Attachments: HDFS ArchDoc.Jira.docx > > > This material provides an overview of the HDFS architecture and is intended > for contributors. The goal of this document is to provide a guide to the > overall structure of the HDFS code so that contributors can more effectively > understand how changes that they are considering can be made, and the > consequences of those changes. The assumption is that the reader has a basic > understanding of HDFS, its purpose, and how it fits into the Hadoop project > suite. > An HTML version of the architectural documentation can be found at: > http://kazman.shidler.hawaii.edu/ArchDoc.html > All comments and suggestions for improvements are appreciated. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira