[ 
https://issues.apache.org/jira/browse/HADOOP-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676437#action_12676437
 ] 

Suresh Srinivas commented on HADOOP-4539:
-----------------------------------------

I will send comments on the design doc later.

Comments (lot of them are nits...):
Design related:
1. Design uses BackupNode as a subclass of NameNode. In future where Backup can 
transition to Active or Active can transition to Backup, this relationship 
seems stringent. Capturing active, backup as the state of Namenode might be a 
better idea.
2. If EditLogBackupOutputStream fails to write to the backup, how does active 
and backup recover from this condition. Looks like the current mechanism to 
recover a failed stream to disk is not workable for stream to backup.


Coding related:
1. General comments:
1.1. For consistently use NameNode and DataNode intead of name-node and 
data-node
1.2. Many class contructors call default constructor of the super class 
{{super()}}. This is unnecessary.
1.3. Many if statements are not enclosed in curly brackets. Do we follow this 
java coding convention?

2. UnregisterNodeException.java 
2.1. is shown as changed file. It should be new add and 
UnregisteredDatanodeException.java should be deleted.
2.2. Class comment still only talks about Datanode. It should be made generic 
to all types of node
2.3. Constructor can be change to take {storageID} and {name} instead of 
{DatanodeID} and {DatanodeInfo}

3. NodeRegistration.java
3.1. Class comment seems to indicate this is NameNode specific. Comment should 
indicate that it is generic.

4. CheckpointCommand.java
4.1. Additional description for the class to indicate that this is sent in 
response to {{startCheckpoint}} NamenodeProtocol command would be good.

5. NamenodeProtocol.java
5.1. In class comments why are the changes made for previous protocol version 
removed (in this case version 2). Isn't retaining this history a good idea for 
better documentation?
5.2. {{getBlocks()}} comment suggests that it i used by {{Balancer}}. Why have 
this comment if this could be used by things other than {{Balancer}} in the 
future.
5.3. Description for methods {{versionRequest}}, {{errorReport}}, 
{{endCheckpoint}}, 
5.4. {{startCheckpoint()}} good to indicate direction of the message (from 
backup to active)
5.5. Move static variables for Journal pipeline action to the beginning of the 
class
5.6. {{journal()}} better documentation on what it does and the direction of 
this message.

6. ServerCommand.java
6.1. Should the action static ints be moved to NamenodeProtocol.java? This is 
similar to Journal actions defined in NamenodeProtocol.java

7. HdfsConstants.java
7.1. It is better to be explicit about printing {{NamenodeRole}} as "Backup 
NameNode" and "Checkpoint NameNode" instead of just "Backup Node" and 
"Checkpoint Node".

8. BackupNode.java
8.1. In class comments </li> and </ol> are missing.
8.2. No need to defined {{LOG}} as it is inherited from {{NameNode}}
8.3. Using dfs.backup.address and dfs.backup.http.address means backup is 
defined at the install time. To change backup to active namenode, configuration 
change needs to be made. This could be restrictive in the future when a backup 
node can automatically transtion to become active.
8.4. When starting nodes, we may need to check the address read from 
configuration with the local node address for http.address etc.
8.5. Comment "NameNodeCommon methods" should change to "Common NameNode methods"
8.6. {{import org.apa/che.commons.logging.*;}} is not used
8.7. Replace {{import java.io.*; import java.net.*}} to import specific ones 
{{IOException, InetSocketAddress, SocketTimeoutException}}
8.8. {{handleShake()}} use {{NamenodeProtocol.NOTIFY}} instead of 
{{DatanodeProtocol.NOTIFY}} in {{errorReport}}
8.9. Should BackupNode override {{Namenode.errorReport()}}, in order to not do 
the processing of releasing backup node?
8.10. Should BackupNode throw exception for NamenodeProtocol methods 
{{NameNode.journalSize()}}?
8.11. {{stop()}} - is there a need for {{RPC.stopProxy(namenode)}}. Does 
{{super.stop()}} take care of it?


9. Checkpointer.java
9.1. {{doCheckPoint()}} calls 
{{backupNode.setCheckpointState(CheckpointStates.ROLLED_EDITS)}}. Should we do 
this in {{startJournalSpool()}}. That way for some reason {{startCheckpoint()}} 
to active fails, the checkpointer state remains consistent. {{doCheckPoint()}} 
should wait for state to reach CheckpointStates.ROLLED_EDITS to make sure the 
active has rolled edits before proceeding further with checkpoint. This helps 
in any boundary conditions where checkpoint could miss edits that might still 
come from active to previous edits log.
9.2. {{doCheckpoint()}} should call {{convergeJournalSpool()}} for BackupNode 
of type BACKUP only
9.3. Given that Checkpointer is looking into internals of BackupNode so much, 
should it be made private inner class of BackupNode?
9.4. Should {{downloadCheckpoint()}} and {{uploadCheckpoint()}} have 
registrationID or some other token that is ephemeral perhaps for validating the 
access. Otherwise these URLs could be misused to mess with the Namenode state. 
This could be done in another jira.
9.5. {{run()}} Use either {{period}} or {{checkpointPeriod}} in the while loop. 
Also we can keep {{1000 * period}} calculated outside the while loop.

10. NameNode.java
10.1. For better readability NameNode could support instead of {{node.getRole() 
== NamenodeRole.BACKUP}}, method {{isRole(NamenodeRole.BACKUP)}} or 
{{isActive()}}, {{isBackup()}}, {{isCheckpointer()}}.
10.2. Not sure what the comment {{// trash}} in {{NameNode}} constructor is 
far...

11. CheckpointSignature.java
11.1. Need to create a separate jira to enhance this to include additional 
information that could help in catching error conditions such as backup node 
FSImage number of files not matching the active node etc. Currently any 
condition where backup does not match the active goes undetected.

12 EditLogBackupInputStream.java
12.1. Should constructor be using parameter {{name}} to set the member 
{{address}}?
12.2. {{getType()}} should return {{JournalType.BACKUP}}
12.3. Class comments need to be updated


13. FSNamesystem.java
13.1. {{getStorageDirs()}} {{FSNamesystem.LOG}} can just be {{LOG()}}


14. FSImage.java
14.1. minor nit - {{startCheckpoint()}} comment has name-name.
14.2. {{startCheckpoint()}} should some of the {{if}} conditions be {{else if}} 
where {{msg}} is set
14.3. Adding parentheses to the if condition that combines || with && will make 
it more readable



> Streaming Edits to a Standby Name-Node.
> ---------------------------------------
>
>                 Key: HADOOP-4539
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4539
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>         Attachments: BackupNode.patch, image001.gif, StreamEditsToBN.pdf, 
> StreamEditsToSNN.htm
>
>
> Currently Secondary name-node acts as mere checkpointer.
> Secondary name-node should be transformed into a standby name-node (SNN). 
> The long term goal is to make it a warm standby. 
> The purpose of this issue is to provide real time streaming of edits to SNN 
> so that it contained the up-to-date namespace state.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to