[
https://issues.apache.org/jira/browse/HADOOP-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12680719#action_12680719
]
Konstantin Shvachko commented on HADOOP-4539:
---------------------------------------------
I am going to comment on Suresh's items that are not reflected in the new patch.
Design related:
1. Active node does not turn into Backup node in current design. We plan to
have a possibility to turn a Standby node into Active, yes but this is done by
just changing the role of the node. I think it makes sense to keep BackupNode
as a subclass of NameNode because some commands, those that don't modify the
meta-data, will work on the backup node exactly as they do on the active node.
Say you can do {{lsr}} from the backup, but if you try to create a file it will
throw {{SafeModeException}}.
2. EditLogBackupOutputStream failing to write to the backup node is handled by
calling {{processIOError()}}, which removes the failed BackupOutputStream from
the list of EditLogOutputStream(s) and increments the checkpointTime for the
remaining streams to make the failed image outdated.
Coding related:
1.1. name-node and data-node vs NameNode and DataNode is actually consistent if
you look at other comments. Lets say NameNode and DataNode refer to classes and
name-node and data-node to the abstractions behind them.
1.2. Calling super() in class constructors is unnecessary, but helps me to
debug and analyze the code. You can set a breakpoint, step into, and easily
find the code of the super constructor.
2.3. {{UnregisteredNodeException}} Constructor takes DatanodeID and
DatanodeInfo as parameters because this is a constructor specific for
data-nodes. We may later need name-node, balancer, etc specific constructors,
which will have their special parameters.
5.1. We do not keep comments for previous versions in the code, they can be
retrieved from svn history. I copied the respective comment from ClientProtocol.
6.1. I think that {{ServerCommand}} actions (static ints) should belong to the
commands that use them. I plan to file an issue to move {{DatanodeCommand}}
actions into the respective commands.
7.1. My reasoning is that we have NameNode, DataNode, and now we will have
Backup Node and Checkpoint node. I first thought about naming them "Backup
NameNode" and "Checkpoint NameNode" connotative to SecondaryNameNode.
But "Backup node" sounds better to me and besides this is what they are - nodes.
8.3. {{dfs.backup.address}} and {{dfs.backup.http.address}} define the
addresses which the backup node starts on its rpc and http servers. You do not
need to change the configuration when the node transitions from backup to
active.
Example. If you start DataNode on port 0, which means on any free port, you do
not change the config file after startup, right? Same with backup/active nodes.
8.9. Backup node does not have backup streams now, so {{errorReport()}} is just
harmless. If we decide to implement chaining edits streams then yes we will
need to override {{errorReport()}} and probably other methods too.
8.10. {{BackupNode.journalSize()}} will report the correct size of its edits
file. I don't think it should throw an exception.
8.11. {{BackupNode.stop()}} should call {{RPC.stopProxy()}} because it needs to
stop the name-node proxy it is connected to as a client.
And the {{NameNode.stop()}} will not take care of that because NameNode does
not act as its own client.
9.1. {{Checkpointer.doCheckPoint()}} first calls {{NameNode.startCheckpoint()}}
which in turn calls {{startJournalSpool()}} from the NameNode itself and does
not return until the BackupNode actually rolls its edits.
9.2. {{convergeJournalSpool()}} contains common logic for Backup and Checkpoint
nodes. I agree the name does not make much sense for the checkpoint node since
it does not converge no journals as it doesn't have any.
But for the backup node its reflects the semantics of the operation.
9.3. {{Checkpointer}} should not be an inner class of the {{BackupNode}}, imo.
We have been through this with FSNamesystem class, had to factor out most of
the stuff out of it.
9.4. {{downloadCheckpoint()}} and {{uploadCheckpoint()}} should evelve in the
future. Right now they use {{CheckpointSignature}} for validation, but we may
need to include registration as well.
11.1. {{CheckpointSignature}} currently includes the name-nodes's edits file
modification time only for identification purposes. I planned to create an
issue to include there the number of files in the namepsace at the time when
the checkpoint starts.
> Streaming Edits to a Standby Name-Node.
> ---------------------------------------
>
> Key: HADOOP-4539
> URL: https://issues.apache.org/jira/browse/HADOOP-4539
> Project: Hadoop Core
> Issue Type: New Feature
> Components: dfs
> Reporter: Konstantin Shvachko
> Assignee: Konstantin Shvachko
> Fix For: 0.21.0
>
> Attachments: BackupNode.patch, BackupNode.patch, BackupNode.patch,
> image001.gif, StreamEditsToBN.pdf, StreamEditsToSNN.htm
>
>
> Currently Secondary name-node acts as mere checkpointer.
> Secondary name-node should be transformed into a standby name-node (SNN).
> The long term goal is to make it a warm standby.
> The purpose of this issue is to provide real time streaming of edits to SNN
> so that it contained the up-to-date namespace state.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.