[
https://issues.apache.org/jira/browse/FLUME-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449878#comment-13449878
]
Brock Noland commented on FLUME-1516:
-------------------------------------
Hi Ted,
I think what we want to do is something like this:
1) On startup copy checkpoint to checkpoint.tmp
2) When checkpointing
2.1) write data to checkpoint.tmp
2.2) Rename checkpoint.tmp to checkpoint
2.3) In background, copy checkpoint to checkpoint for the next checkpoint
However, there are a ton of pending changes to this code in FLUME-1487 so any
change might be best delayed until after we merge that patch. If we want to
tackle it now we certainly could, just trying to save myself a little work!
> Write Dual Checkpoints to avoid replays
> ---------------------------------------
>
> Key: FLUME-1516
> URL: https://issues.apache.org/jira/browse/FLUME-1516
> Project: Flume
> Issue Type: Improvement
> Components: Channel
> Affects Versions: v1.3.0
> Reporter: Brock Noland
>
> Per the LFS paper (http://www.cs.berkeley.edu/~brewer/cs262/LFS.pdf) we can
> write two checkpoints to avoid replaying the logs in the case we
> crash/shutdown while writing a checkpoint.
> Section 4:
> "In order to handle a crash during a checkpoint operation there are actually
> two checkpoint regions, and checkpoint operations alternate between them. The
> checkpoint time is in the last block of the checkpoint so if the checkpoint
> fails the time will not be updated. During reboot, the system reads both
> checkpoint regions and uses the one with the most recent time."
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira