[
http://issues.apache.org/jira/browse/HADOOP-227?page=comments#action_12458332 ]
dhruba borthakur commented on HADOOP-227:
-----------------------------------------
Here is a much detailed writeup on the Backup NameNode proposal. "Secondary
NameNode" and "Backup NameNode" refer to the same node in this writeup. Please
review and comment.
Configuration
-------------
There will be an additional file named "masters" in the configuration directory
(similar to the "slaves" file) that will list the node names where Secondary
NameNode should be run. The start-dfs.sh script will start the
Secondary-NameNode appropriately.
The configuration file will have a the following new definitions:
* fs.checkpoint.dir : Location where the Secondary NameNode can
download the
fsImage and edits file.
* fs.checkpoint.period : Time (in seconds) between two checkpoints.
* fs.checkpoint.size : Size (in MB) of edit log that triggers a
checkpoint.
The Secondary NameNode will use "org.apache.hadoop.dfs.NameNode.Alternate"
property to log its debug and informational messages.
Primary NameNode
--------------------------
The Primary NameNode will add the following new RPCs to the ClientProtocol:
* getEditLogSize()
This call returns the size of the current edit log file. This call fails
if the NameNode is in SafeMode or there are more than one edit log file.
* rollEditLog()
This call closes the current edit log and opens a new edit log file.
The names of the edit files are either "edits" or "edits.new". To keep
complexity to a minimum, there will be a max of two edit log
files "edits" and "edits.1".
This call returns an error if any of the following conditions occur:
- NameNode is in SafeMode
- Both "edits" and 'edits.new" are already pre-existing
* rollFsImage()
This call does the following steps (atomically):
- removes fsImage
- copies fsImage.tmp to fsImage
- removes edits
- moves edit.new to edits
This call fails if any of the files fsImage, fsImage.new or edits
does not exist. It also fails if the dfs is in SafeMode.
The NameNode will have two additional servlets:
* putFsImage.class
This servlet causes all the incoming data to be stored in a file
named fsImage.tmp in the dfs.name.dir directory. If this file already
exists, then this call returns error.
* getFile.class?param=pathname
This servlet retrieves the contents of the specified file.
The Primary NameNode at startup time deletes fsImage.tmp (if it exists). The
NameNode loads the fsImage, then loads the edits and then loads edits.1. Then
it writes the merged fsImage, deletes edits and edits.1.
Secondary NameNode
-------------------------------
The Secondary NameNode periodically pings the NameNode with the
getCurrentEditLogSize() RPC. This call returns the size of the current edit
log. The Secondary NameNode initiates a checkpoint if either the size of the
edit log exceeds the size specified in the fs.checkpoint.size or if the time
since last checkpoint completion has exceeded fs.checkpoint.period.
The Secondary NameNode issues the rollEditLog() RPC to instruct the Primary
NameNode to start logging edits into edits.1. The Secondary NameNode then uses
the getFile servlet to fetch the contents of fsImage and edits. It puts them in
the fs.checkpoint.dir and, reads them into memory, merges them and writes it
back to fsImage.tmp. The Secondary NameNode than uploads the fsImage.tmp file
to the Primary NameNode using the putFsImage servlet.
Once the above steps are successful, the Secondary NameNode issues the
rollFsImage() RPC. A checkpoint is complete when this RPC completes
successfully.
If any of the RPC calls returns an error, the Secondary NameNode discards all
processing that it might have done, logs an error message, and waits for the
normal trigger to start the next checkpoint.
Issues
------
1. The emphasis is on simplicity. For this reason, the NameNode restricts that
there can be only two outstanding edits file at any time: edits and edits.1.
This ensures that there cannot be more than one Secondary NameNode for a
Primary NameNode.
2. The fact that rollFsImage() fails if either edits or edits.1 are
non-existent means that the system is protected against spurious checkpoint if
the NameNode restarts when the Secondary NameNode was doing a merge. This check
can be made more explicit by returning a cookie with the rollEditLog() command
and enforcing that rollFsImage() supplies the same cookie. (The Primary
NameNode resets the cookie if it restarts).
> Namespace check pointing is not performed until the namenode restarts.
> ----------------------------------------------------------------------
>
> Key: HADOOP-227
> URL: http://issues.apache.org/jira/browse/HADOOP-227
> Project: Hadoop
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.2.0
> Reporter: Konstantin Shvachko
> Assigned To: dhruba borthakur
> Attachments: patch-async-checkpoints-0.9.0,
> patch-async-checkpoints-0.9.0, patch-async-checkpoints-0.9.0
>
>
> In current implementation when the name node starts, it reads its image file,
> then
> the edits file, and then saves the updated image back into the image file.
> The image file is never updated after that.
> In order to provide the system reliability reliability the namespace
> information should
> be check pointed periodically, and the edits file should be kept relatively
> small.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira