[ 
https://issues.apache.org/jira/browse/FLUME-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449865#comment-13449865
 ] 

Ted Malaska commented on FLUME-1516:
------------------------------------

Hmm I'm trying to follow.

Hari is saying the writing process is as follows:
1. Write event A to checkpoint.tmp
2. Rename checkpoint to checkpoint.old
3. Rename checkpoint.tmp to checkpoint
4. Write event A to checkpoint.old
5. Rename checkpoint.old to checkpoint.tmp

Then the reading process would be as follows:
1. Try to read from checkpoint
2. If checkpoint is not there then try to read from checkpoint.tmp
3. else read from checkpoint.old (this shouldn't happen)

Let me know if this aligns with your thinking.  If it does I will attempt to 
write the fix.
                
> Write Dual Checkpoints to avoid replays
> ---------------------------------------
>
>                 Key: FLUME-1516
>                 URL: https://issues.apache.org/jira/browse/FLUME-1516
>             Project: Flume
>          Issue Type: Improvement
>          Components: Channel
>    Affects Versions: v1.3.0
>            Reporter: Brock Noland
>
> Per the LFS paper (http://www.cs.berkeley.edu/~brewer/cs262/LFS.pdf) we can 
> write two checkpoints to avoid replaying the logs in the case we 
> crash/shutdown while writing a checkpoint.
> Section 4:
> "In order to handle a crash during a checkpoint operation there are actually 
> two checkpoint regions, and checkpoint operations alternate between them. The 
> checkpoint time is in the last block of the checkpoint so if the checkpoint 
> fails the time will not be updated. During reboot, the system reads both 
> checkpoint regions and uses the one with the most recent time."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to