[ 
https://issues.apache.org/jira/browse/HDFS-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036410#comment-13036410
 ] 

Matt Foley commented on HDFS-1961:
----------------------------------

Good start.  A few suggestions:

Section 4.4:  Suggest start with:
"All communication between Namenode and Datanode is initiated by the Datanode, 
and responded to by the Namenode. The Namenode never initiates communication to 
the Datanode, although Namenode responses may include commands to the Datanode 
that cause it to send further communications."

4.4.2 "DataNode Command – send heartbeat."
Suggest change to "DataNode sends Heartbeat."

4.4.3 "DataNodeCommand – block report."
Suggest change to "DataNode sends BlockReport."

4.4.4 "BlockReceived."
Suggest change to "DataNode notifies BlockReceived."

Section 5.2:
In the list of NN "threads", calling the first one "HeartBeat" is a little 
confusing.  Please consider calling it something like "Datanode Health 
Management", instead.  In the code it is called "HeartbeatMonitor", but its job 
is neither sending nor receiving heartbeats, but rather to periodically check 
to make sure that every Datanode has sent a heartbeat at least once in the last 
10 minutes (or as configured).
Should probably also mention the bundle of threads that provide the Namenode's 
RPC service, which receives and processes all 13 kinds of communication from 
Datanodes and Clients.

Section 5.3:
"This [blockReceived notification] may prevent NameNode temporarily from asking 
for a full block report since the receipt of a blockReceived() message 
indicates that the DataNode is still alive."
That sentence isn't correct, since it is relatively unusual for the NN to ask 
the DN for a block report. (It only happens when recovering from gross errors.)

Instead, suggest including in this section a brief discussion of the fact that 
the DN sends a heartbeat to the NN every 3 seconds (or as configured), which 
allows the NN a chance to respond with commands such as 
* "delete replica" if a block has become over-replicated, or 
* "copy replica to this other DN" if a block needs further replication.  
And the DN initiates a BlockReport to the NN every hour (or as configured), 
which prevents any divergence in the NN and DN belief about which replicas are 
held by each datanode.  
And yes, it also sends an immediate blockReceived notification whenever it 
receives a new block, whether from a Client (file create/append), or from 
another Datanode (block replication).

"A blockReport() is also issued periodically as a portion of the HeartBeat."
Not exactly.  The DN's "heartbeat thread" takes care of sending both, at the 
appropriate time intervals, but they are separate RPCs to the NN.

> New architectural documentation created
> ---------------------------------------
>
>                 Key: HDFS-1961
>                 URL: https://issues.apache.org/jira/browse/HDFS-1961
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: documentation
>    Affects Versions: 0.21.0
>            Reporter: Rick Kazman
>              Labels: architecture, hadoop, newbie
>             Fix For: 0.21.0
>
>         Attachments: HDFS ArchDoc.Jira.docx
>
>
> This material provides an overview of the HDFS architecture and is intended 
> for contributors. The goal of this document is to provide a guide to the 
> overall structure of the HDFS code so that contributors can more effectively 
> understand how changes that they are considering can be made, and the 
> consequences of those changes. The assumption is that the reader has a basic 
> understanding of HDFS, its purpose, and how it fits into the Hadoop project 
> suite. 
> An HTML version of the architectural documentation can be found at:  
> http://kazman.shidler.hawaii.edu/ArchDoc.html
> All comments and suggestions for improvements are appreciated.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to