[ 
https://issues.apache.org/jira/browse/HADOOP-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12680719#action_12680719
 ] 

Konstantin Shvachko commented on HADOOP-4539:
---------------------------------------------

I am going to comment on Suresh's items that are not reflected in the new patch.

Design related:

1. Active node does not turn into Backup node in current design. We plan to 
have a possibility to turn a Standby node into Active, yes but this is done by 
just changing the role of the node. I think it makes sense to keep BackupNode 
as a subclass of NameNode because some commands, those that don't modify the 
meta-data, will work on the backup node exactly as they do on the active node. 
Say you can do {{lsr}} from the backup, but if you try to create a file it will 
throw {{SafeModeException}}.

2. EditLogBackupOutputStream failing to write to the backup node is handled by 
calling {{processIOError()}}, which removes the failed BackupOutputStream from 
the list of EditLogOutputStream(s) and increments the checkpointTime for the 
remaining streams to make the failed image outdated.

Coding related:

1.1. name-node and data-node vs NameNode and DataNode is actually consistent if 
you look at other comments. Lets say NameNode and DataNode refer to classes and 
name-node and data-node to the abstractions behind them.
1.2. Calling super() in class constructors is unnecessary, but helps me to 
debug and analyze the code. You can set a breakpoint, step into, and easily 
find the code of the super constructor.

2.3. {{UnregisteredNodeException}} Constructor takes DatanodeID and 
DatanodeInfo as parameters because this is a constructor specific for 
data-nodes. We may later need name-node, balancer, etc specific constructors, 
which will have their special parameters.

5.1. We do not keep comments for previous versions in the code, they can be 
retrieved from svn history. I copied the respective comment from ClientProtocol.

6.1. I think that {{ServerCommand}} actions (static ints) should belong to the 
commands that use them. I plan to file an issue to move {{DatanodeCommand}} 
actions into the respective commands.

7.1. My reasoning is that we have NameNode, DataNode, and now we will have 
Backup Node and Checkpoint node. I first thought about naming them "Backup 
NameNode" and "Checkpoint NameNode" connotative to SecondaryNameNode. 
But "Backup node" sounds better to me and besides this is what they are - nodes.

8.3. {{dfs.backup.address}} and {{dfs.backup.http.address}} define the 
addresses which the backup node starts on its rpc and http servers. You do not 
need to change the configuration when the node transitions from backup to 
active. 
Example. If you start DataNode on port 0, which means on any free port, you do 
not change the config file after startup, right? Same with backup/active nodes.
8.9. Backup node does not have backup streams now, so {{errorReport()}} is just 
harmless. If we decide to implement chaining edits streams then yes we will 
need to override {{errorReport()}} and probably other methods too.
8.10. {{BackupNode.journalSize()}} will report the correct size of its edits 
file. I don't think it should throw an exception.
8.11. {{BackupNode.stop()}} should call {{RPC.stopProxy()}} because it needs to 
stop the name-node proxy it is connected to as a client.
And the {{NameNode.stop()}} will not take care of that because NameNode does 
not act as its own client.

9.1. {{Checkpointer.doCheckPoint()}} first calls {{NameNode.startCheckpoint()}} 
which in turn calls {{startJournalSpool()}} from the NameNode itself and does 
not return until the BackupNode actually rolls its edits.
9.2. {{convergeJournalSpool()}} contains common logic for Backup and Checkpoint 
nodes. I agree the name does not make much sense for the checkpoint node since 
it does not converge no journals as it doesn't have any.
But for the backup node its reflects the semantics of the operation.
9.3. {{Checkpointer}} should not be an inner class of the {{BackupNode}}, imo. 
We have been through this with FSNamesystem class, had to factor out most of 
the stuff out of it.
9.4. {{downloadCheckpoint()}} and {{uploadCheckpoint()}} should evelve in the 
future. Right now they use {{CheckpointSignature}} for validation, but we may 
need to include registration as well.

11.1. {{CheckpointSignature}} currently includes the name-nodes's edits file 
modification time only for identification purposes. I planned to create an 
issue to include there the number of files in the namepsace at the time when 
the checkpoint starts.

> Streaming Edits to a Standby Name-Node.
> ---------------------------------------
>
>                 Key: HADOOP-4539
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4539
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>             Fix For: 0.21.0
>
>         Attachments: BackupNode.patch, BackupNode.patch, BackupNode.patch, 
> image001.gif, StreamEditsToBN.pdf, StreamEditsToSNN.htm
>
>
> Currently Secondary name-node acts as mere checkpointer.
> Secondary name-node should be transformed into a standby name-node (SNN). 
> The long term goal is to make it a warm standby. 
> The purpose of this issue is to provide real time streaming of edits to SNN 
> so that it contained the up-to-date namespace state.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to