[ http://issues.apache.org/jira/browse/HADOOP-227?page=comments#action_12458332 ] dhruba borthakur commented on HADOOP-227: -----------------------------------------
Here is a much detailed writeup on the Backup NameNode proposal. "Secondary NameNode" and "Backup NameNode" refer to the same node in this writeup. Please review and comment. Configuration ------------- There will be an additional file named "masters" in the configuration directory (similar to the "slaves" file) that will list the node names where Secondary NameNode should be run. The start-dfs.sh script will start the Secondary-NameNode appropriately. The configuration file will have a the following new definitions: * fs.checkpoint.dir : Location where the Secondary NameNode can download the fsImage and edits file. * fs.checkpoint.period : Time (in seconds) between two checkpoints. * fs.checkpoint.size : Size (in MB) of edit log that triggers a checkpoint. The Secondary NameNode will use "org.apache.hadoop.dfs.NameNode.Alternate" property to log its debug and informational messages. Primary NameNode -------------------------- The Primary NameNode will add the following new RPCs to the ClientProtocol: * getEditLogSize() This call returns the size of the current edit log file. This call fails if the NameNode is in SafeMode or there are more than one edit log file. * rollEditLog() This call closes the current edit log and opens a new edit log file. The names of the edit files are either "edits" or "edits.new". To keep complexity to a minimum, there will be a max of two edit log files "edits" and "edits.1". This call returns an error if any of the following conditions occur: - NameNode is in SafeMode - Both "edits" and 'edits.new" are already pre-existing * rollFsImage() This call does the following steps (atomically): - removes fsImage - copies fsImage.tmp to fsImage - removes edits - moves edit.new to edits This call fails if any of the files fsImage, fsImage.new or edits does not exist. It also fails if the dfs is in SafeMode. The NameNode will have two additional servlets: * putFsImage.class This servlet causes all the incoming data to be stored in a file named fsImage.tmp in the dfs.name.dir directory. If this file already exists, then this call returns error. * getFile.class?param=pathname This servlet retrieves the contents of the specified file. The Primary NameNode at startup time deletes fsImage.tmp (if it exists). The NameNode loads the fsImage, then loads the edits and then loads edits.1. Then it writes the merged fsImage, deletes edits and edits.1. Secondary NameNode ------------------------------- The Secondary NameNode periodically pings the NameNode with the getCurrentEditLogSize() RPC. This call returns the size of the current edit log. The Secondary NameNode initiates a checkpoint if either the size of the edit log exceeds the size specified in the fs.checkpoint.size or if the time since last checkpoint completion has exceeded fs.checkpoint.period. The Secondary NameNode issues the rollEditLog() RPC to instruct the Primary NameNode to start logging edits into edits.1. The Secondary NameNode then uses the getFile servlet to fetch the contents of fsImage and edits. It puts them in the fs.checkpoint.dir and, reads them into memory, merges them and writes it back to fsImage.tmp. The Secondary NameNode than uploads the fsImage.tmp file to the Primary NameNode using the putFsImage servlet. Once the above steps are successful, the Secondary NameNode issues the rollFsImage() RPC. A checkpoint is complete when this RPC completes successfully. If any of the RPC calls returns an error, the Secondary NameNode discards all processing that it might have done, logs an error message, and waits for the normal trigger to start the next checkpoint. Issues ------ 1. The emphasis is on simplicity. For this reason, the NameNode restricts that there can be only two outstanding edits file at any time: edits and edits.1. This ensures that there cannot be more than one Secondary NameNode for a Primary NameNode. 2. The fact that rollFsImage() fails if either edits or edits.1 are non-existent means that the system is protected against spurious checkpoint if the NameNode restarts when the Secondary NameNode was doing a merge. This check can be made more explicit by returning a cookie with the rollEditLog() command and enforcing that rollFsImage() supplies the same cookie. (The Primary NameNode resets the cookie if it restarts). > Namespace check pointing is not performed until the namenode restarts. > ---------------------------------------------------------------------- > > Key: HADOOP-227 > URL: http://issues.apache.org/jira/browse/HADOOP-227 > Project: Hadoop > Issue Type: Bug > Components: dfs > Affects Versions: 0.2.0 > Reporter: Konstantin Shvachko > Assigned To: dhruba borthakur > Attachments: patch-async-checkpoints-0.9.0, > patch-async-checkpoints-0.9.0, patch-async-checkpoints-0.9.0 > > > In current implementation when the name node starts, it reads its image file, > then > the edits file, and then saves the updated image back into the image file. > The image file is never updated after that. > In order to provide the system reliability reliability the namespace > information should > be check pointed periodically, and the edits file should be kept relatively > small. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira