[
https://issues.apache.org/jira/browse/HADOOP-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676437#action_12676437
]
Suresh Srinivas commented on HADOOP-4539:
-----------------------------------------
I will send comments on the design doc later.
Comments (lot of them are nits...):
Design related:
1. Design uses BackupNode as a subclass of NameNode. In future where Backup can
transition to Active or Active can transition to Backup, this relationship
seems stringent. Capturing active, backup as the state of Namenode might be a
better idea.
2. If EditLogBackupOutputStream fails to write to the backup, how does active
and backup recover from this condition. Looks like the current mechanism to
recover a failed stream to disk is not workable for stream to backup.
Coding related:
1. General comments:
1.1. For consistently use NameNode and DataNode intead of name-node and
data-node
1.2. Many class contructors call default constructor of the super class
{{super()}}. This is unnecessary.
1.3. Many if statements are not enclosed in curly brackets. Do we follow this
java coding convention?
2. UnregisterNodeException.java
2.1. is shown as changed file. It should be new add and
UnregisteredDatanodeException.java should be deleted.
2.2. Class comment still only talks about Datanode. It should be made generic
to all types of node
2.3. Constructor can be change to take {storageID} and {name} instead of
{DatanodeID} and {DatanodeInfo}
3. NodeRegistration.java
3.1. Class comment seems to indicate this is NameNode specific. Comment should
indicate that it is generic.
4. CheckpointCommand.java
4.1. Additional description for the class to indicate that this is sent in
response to {{startCheckpoint}} NamenodeProtocol command would be good.
5. NamenodeProtocol.java
5.1. In class comments why are the changes made for previous protocol version
removed (in this case version 2). Isn't retaining this history a good idea for
better documentation?
5.2. {{getBlocks()}} comment suggests that it i used by {{Balancer}}. Why have
this comment if this could be used by things other than {{Balancer}} in the
future.
5.3. Description for methods {{versionRequest}}, {{errorReport}},
{{endCheckpoint}},
5.4. {{startCheckpoint()}} good to indicate direction of the message (from
backup to active)
5.5. Move static variables for Journal pipeline action to the beginning of the
class
5.6. {{journal()}} better documentation on what it does and the direction of
this message.
6. ServerCommand.java
6.1. Should the action static ints be moved to NamenodeProtocol.java? This is
similar to Journal actions defined in NamenodeProtocol.java
7. HdfsConstants.java
7.1. It is better to be explicit about printing {{NamenodeRole}} as "Backup
NameNode" and "Checkpoint NameNode" instead of just "Backup Node" and
"Checkpoint Node".
8. BackupNode.java
8.1. In class comments </li> and </ol> are missing.
8.2. No need to defined {{LOG}} as it is inherited from {{NameNode}}
8.3. Using dfs.backup.address and dfs.backup.http.address means backup is
defined at the install time. To change backup to active namenode, configuration
change needs to be made. This could be restrictive in the future when a backup
node can automatically transtion to become active.
8.4. When starting nodes, we may need to check the address read from
configuration with the local node address for http.address etc.
8.5. Comment "NameNodeCommon methods" should change to "Common NameNode methods"
8.6. {{import org.apa/che.commons.logging.*;}} is not used
8.7. Replace {{import java.io.*; import java.net.*}} to import specific ones
{{IOException, InetSocketAddress, SocketTimeoutException}}
8.8. {{handleShake()}} use {{NamenodeProtocol.NOTIFY}} instead of
{{DatanodeProtocol.NOTIFY}} in {{errorReport}}
8.9. Should BackupNode override {{Namenode.errorReport()}}, in order to not do
the processing of releasing backup node?
8.10. Should BackupNode throw exception for NamenodeProtocol methods
{{NameNode.journalSize()}}?
8.11. {{stop()}} - is there a need for {{RPC.stopProxy(namenode)}}. Does
{{super.stop()}} take care of it?
9. Checkpointer.java
9.1. {{doCheckPoint()}} calls
{{backupNode.setCheckpointState(CheckpointStates.ROLLED_EDITS)}}. Should we do
this in {{startJournalSpool()}}. That way for some reason {{startCheckpoint()}}
to active fails, the checkpointer state remains consistent. {{doCheckPoint()}}
should wait for state to reach CheckpointStates.ROLLED_EDITS to make sure the
active has rolled edits before proceeding further with checkpoint. This helps
in any boundary conditions where checkpoint could miss edits that might still
come from active to previous edits log.
9.2. {{doCheckpoint()}} should call {{convergeJournalSpool()}} for BackupNode
of type BACKUP only
9.3. Given that Checkpointer is looking into internals of BackupNode so much,
should it be made private inner class of BackupNode?
9.4. Should {{downloadCheckpoint()}} and {{uploadCheckpoint()}} have
registrationID or some other token that is ephemeral perhaps for validating the
access. Otherwise these URLs could be misused to mess with the Namenode state.
This could be done in another jira.
9.5. {{run()}} Use either {{period}} or {{checkpointPeriod}} in the while loop.
Also we can keep {{1000 * period}} calculated outside the while loop.
10. NameNode.java
10.1. For better readability NameNode could support instead of {{node.getRole()
== NamenodeRole.BACKUP}}, method {{isRole(NamenodeRole.BACKUP)}} or
{{isActive()}}, {{isBackup()}}, {{isCheckpointer()}}.
10.2. Not sure what the comment {{// trash}} in {{NameNode}} constructor is
far...
11. CheckpointSignature.java
11.1. Need to create a separate jira to enhance this to include additional
information that could help in catching error conditions such as backup node
FSImage number of files not matching the active node etc. Currently any
condition where backup does not match the active goes undetected.
12 EditLogBackupInputStream.java
12.1. Should constructor be using parameter {{name}} to set the member
{{address}}?
12.2. {{getType()}} should return {{JournalType.BACKUP}}
12.3. Class comments need to be updated
13. FSNamesystem.java
13.1. {{getStorageDirs()}} {{FSNamesystem.LOG}} can just be {{LOG()}}
14. FSImage.java
14.1. minor nit - {{startCheckpoint()}} comment has name-name.
14.2. {{startCheckpoint()}} should some of the {{if}} conditions be {{else if}}
where {{msg}} is set
14.3. Adding parentheses to the if condition that combines || with && will make
it more readable
> Streaming Edits to a Standby Name-Node.
> ---------------------------------------
>
> Key: HADOOP-4539
> URL: https://issues.apache.org/jira/browse/HADOOP-4539
> Project: Hadoop Core
> Issue Type: New Feature
> Components: dfs
> Reporter: Konstantin Shvachko
> Assignee: Konstantin Shvachko
> Attachments: BackupNode.patch, image001.gif, StreamEditsToBN.pdf,
> StreamEditsToSNN.htm
>
>
> Currently Secondary name-node acts as mere checkpointer.
> Secondary name-node should be transformed into a standby name-node (SNN).
> The long term goal is to make it a warm standby.
> The purpose of this issue is to provide real time streaming of edits to SNN
> so that it contained the up-to-date namespace state.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.