[ 
https://issues.apache.org/jira/browse/HDFS-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13026872#comment-13026872
 ] 

Todd Lipcon commented on HDFS-1801:
-----------------------------------

The previous patch attached here broke TestNameEditsConfig due to the following 
situation:

Imagine there is one image dir /image/1 and two edits dirs /edits/1 and /edits/2
You have the following sequence:
- Start new NN
- Write some edits which go to both edits dirs
- /edits/1 fails
- Write some more edits, now going only to /edits/2
- NN crashes
- /edits/1 is recovered but /edits/2 goes offline
- NN restarts.

It used to be we could distinguish this by the fstime, which we incremented on 
failures. time is an arbitrary measure, and now that we have txids, it's better 
to record the txid.

The new version of this patch still gets rid of fstime, but creates a new file 
called {{seen_txid}} which occasionally is re-written with the current txid. 
It's currently getting rewritten on failure and roll, but could also be 
triggered after some number of transactions.

On startup, the NN will look across all configured directories and see the 
maximum txid stored in {{seen_txid}}. If it can't find edits that include this 
txid, it will refuse to start.

This addition makes TestNameEditsConfig pass again.

> Remove use of timestamps to identify checkpoints and logs
> ---------------------------------------------------------
>
>                 Key: HDFS-1801
>                 URL: https://issues.apache.org/jira/browse/HDFS-1801
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: name-node
>    Affects Versions: Edit log branch (HDFS-1073)
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: Edit log branch (HDFS-1073)
>
>         Attachments: hdfs-1801.txt, hdfs-1801.txt
>
>
> Currently, the NameNode validates checkpoint uploads by using timestamps 
> associated with checkpoints and edit logs. However, now that we have 
> transaction IDs that uniquely identify each point in time in the history of a 
> namespace, it is more robust to simply use transaction IDs to identify images 
> and edits.
> This JIRA is to remove the use of editsTime and checkpointTime and replace it 
> with:
> * {{lastCheckpointTxId}} - the highest transaction ID reflected in the most 
> recently saved fsimage file
> * {{lastLogRollTxId}} - the highest transaction ID in {{edits}} when 
> {{rollFsImage}} was called by the checkpointing node.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to