[ 
http://issues.apache.org/jira/browse/HADOOP-227?page=comments#action_12458332 ] 
            
dhruba borthakur commented on HADOOP-227:
-----------------------------------------

Here is a much detailed writeup on the Backup NameNode proposal. "Secondary 
NameNode" and "Backup NameNode" refer to the same node in this writeup. Please 
review and comment.

Configuration
-------------
There will be an additional file named "masters" in the configuration directory 
(similar to the "slaves" file) that will list the node names where Secondary 
NameNode should be run. The start-dfs.sh script will start the 
Secondary-NameNode appropriately.

The configuration file will have a the following new definitions:
    * fs.checkpoint.dir      : Location where the Secondary NameNode can 
download the
                                        fsImage and edits file.
    * fs.checkpoint.period   : Time (in seconds) between two checkpoints.
    * fs.checkpoint.size     : Size (in MB) of edit log that triggers a 
checkpoint.

The Secondary NameNode will use "org.apache.hadoop.dfs.NameNode.Alternate" 
property to log its debug and informational messages.

Primary NameNode
--------------------------
The Primary NameNode will add the following new RPCs to the ClientProtocol:

    * getEditLogSize()
        This call returns the size of the current edit log file. This call fails
        if the NameNode is in SafeMode or there are more than one edit log file.

    * rollEditLog()
        This call closes the current edit log and opens a new edit log file.
        The names of the edit files are either "edits" or "edits.new". To keep
        complexity to a minimum, there will be a max of two edit log
        files "edits" and "edits.1".
        This call returns an error if any of the following conditions occur:
        - NameNode is in SafeMode
        - Both "edits" and 'edits.new" are already pre-existing

    * rollFsImage()
        This call does the following steps (atomically):
        - removes fsImage
        - copies fsImage.tmp to fsImage
        - removes edits
        - moves edit.new to edits
        This call fails if any of the files fsImage, fsImage.new or edits
        does not exist. It also fails if the dfs is in SafeMode.

The NameNode will have two additional servlets:
    * putFsImage.class
        This servlet causes all the incoming data to be stored in a file
        named fsImage.tmp in the dfs.name.dir directory. If this file already
        exists, then this call returns error.

    * getFile.class?param=pathname
        This servlet retrieves the contents of the specified file.

The Primary NameNode at startup time deletes fsImage.tmp (if it exists). The 
NameNode loads the fsImage, then loads the edits and then loads edits.1.  Then 
it writes the merged fsImage, deletes edits and edits.1.


Secondary NameNode
-------------------------------
The Secondary NameNode periodically pings the NameNode with the 
getCurrentEditLogSize() RPC. This call returns the size of the current edit 
log. The Secondary NameNode initiates a checkpoint if either the size of the 
edit log exceeds the size specified in the fs.checkpoint.size or if the time 
since last checkpoint completion has exceeded fs.checkpoint.period.

The Secondary NameNode issues the rollEditLog() RPC to instruct the Primary 
NameNode to start logging edits into edits.1.  The Secondary NameNode then uses 
the getFile servlet to fetch the contents of fsImage and edits. It puts them in 
the fs.checkpoint.dir and, reads them into memory, merges them and writes it 
back to fsImage.tmp. The Secondary NameNode than uploads the fsImage.tmp file 
to the Primary NameNode using the putFsImage servlet.

Once the above steps are successful, the Secondary NameNode issues the 
rollFsImage() RPC. A checkpoint is complete when this RPC completes 
successfully.

If any of the RPC calls returns an error, the Secondary NameNode discards all 
processing that it might have done, logs an error message, and waits for the 
normal trigger to start the next checkpoint.

Issues
------
1. The emphasis is on simplicity. For this reason, the NameNode restricts that 
there can be only two outstanding edits file at any time: edits and edits.1. 
This ensures that there cannot be more than one Secondary NameNode for a 
Primary NameNode.

2. The fact that rollFsImage() fails if either edits or edits.1 are 
non-existent means that the system is protected against spurious checkpoint if 
the NameNode restarts when the Secondary NameNode was doing a merge. This check 
can be made more explicit by returning a cookie with the rollEditLog() command 
and enforcing that rollFsImage() supplies the same cookie. (The Primary 
NameNode resets the cookie if it restarts).




> Namespace check pointing is not performed until the namenode restarts.
> ----------------------------------------------------------------------
>
>                 Key: HADOOP-227
>                 URL: http://issues.apache.org/jira/browse/HADOOP-227
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.2.0
>            Reporter: Konstantin Shvachko
>         Assigned To: dhruba borthakur
>         Attachments: patch-async-checkpoints-0.9.0, 
> patch-async-checkpoints-0.9.0, patch-async-checkpoints-0.9.0
>
>
> In current implementation when the name node starts, it reads its image file, 
> then
> the edits file, and then saves the updated image back into the image file.
> The image file is never updated after that.
> In order to provide the system reliability reliability the namespace 
> information should
> be check pointed periodically, and the edits file should be kept relatively 
> small.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to