[
https://issues.apache.org/jira/browse/HDFS-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443556#comment-13443556
]
Todd Lipcon commented on HDFS-3863:
-----------------------------------
The design here is pretty simple, given the way our journaling protocol works.
In particular, we only have one outstanding "batch" of transactions at once. We
never send a batch of transactions beginning at txid N until the prior batch
(up through N-1) has been accepted at a quorum of nodes. Thus, any
{{sendEdits()}} call with {{firstTxId}} N implies a {{commit(N-1)}}.
So, my plan is as follows:
- Introduce a new file inside the journal directory called {{committed-txid}}.
This would include a single numeric text line, similar to the {{seen_txid}}
that the NameNode maintains.
- Since this whole feature is not required for correctness, we don't need to
fsync this file on every update. Instead, we can let the operating system write
it out to disk whenever it so chooses. If, after a system crash, it reverts to
an earlier value, this is OK, since our recovery protocol doesn't depend on it
being up-to-date in any way. Put another way, the invariant is that the file
contains a value which is a lower bound on the latest committed txn.
The data would be when any sendEdits() call is made -- the call implicitly
commits all edits prior to the current batch.
This alone is enough for a good sanity check. If we want to also support
reading the committed transactions while in-progress, it's not quite sufficient
-- the last batch of transactions will never be readable if the NN stops
writing new batches for a protracted period of time. To solve this, we can add
a timer thread to the client which periodically (eg once or twice a second)
sends an RPC to update the committed-txid on all of the nodes. The periodic
timer will also have the nice property of causing a NN which has been fenced to
abort itself even if no write transactions are taking place.
> QJM: track last "committed" txid
> --------------------------------
>
> Key: HDFS-3863
> URL: https://issues.apache.org/jira/browse/HDFS-3863
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: ha
> Affects Versions: QuorumJournalManager (HDFS-3077)
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
>
> Per some discussion with [~stepinto]
> [here|https://issues.apache.org/jira/browse/HDFS-3077?focusedCommentId=13422579&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422579],
> we should keep track of the "last committed txid" on each JournalNode. Then
> during any recovery operation, we can sanity-check that we aren't asked to
> truncate a log to an earlier transaction.
> This is also a necessary step if we want to support reading from in-progress
> segments in the future (since we should only allow reads up to the commit
> point)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira