[
https://issues.apache.org/jira/browse/HDFS-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252733#comment-13252733
]
Bikas Saha commented on HDFS-3092:
----------------------------------
Combination of this and HDFS-3077 sound very much like ZAB with one difference
- in the use of 2-phase commit for the write broadcast.
Lets say there is 1 active NN writing to a quorum set of journal daemons. This
is the same as ZAB. Active NN writes edits and ZAB leader writes new states.
ZAB uses 2 phase commits (without abort) for each write while our design is
getting away without it. I am wondering why we can get away with it.
My guess is that each follower in ZAB can also serve reads from clients. Hence,
it cannot serve an update until it is guaranteed that a quorum of followers has
agreed on that update. That is what 2 phase commit gives.
In our case, the active NN is the only server for client reads. Hence, updates
are not served to clients until a quorum acks back.
However, the above would break for us if the standby NN is using any journal
daemon to refresh its state. Because, ideally, a journal node should not inform
the standby about an update until the it knows that the update has been
accepted by a quorum of journal daemons. That would require a 2 phase commit.
E.g. Standby NN3 reads the last edit written to JN1 by old active NN1, before
NN1 realized that it has lost quorum to NN2 (by failing to write to JN2 and
JN3).
Perhaps we can get away with this by using some assumptions on timeouts, or by
additional constraints on the standby. Eg. that it only syncs with finalized
edit segments.
If we say that the standby sync with only the finalized log segments in order
to be safe from the above, then IMO, the tailing of the edits by the standby
should not be done by the standby directly but via a journal daemon API for the
standby. This JD API would ensure that only valid edits are being sent to the
standby (edits from finalized segments or edits known to be safely committed to
a quorum of journal daemons). This way the correctness of the journal protocol
would remain inside it. Instead of leaking it into the standby by having the
standby code remember rules for tailing edits.
> Enable journal protocol based editlog streaming for standby namenode
> --------------------------------------------------------------------
>
> Key: HDFS-3092
> URL: https://issues.apache.org/jira/browse/HDFS-3092
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: ha, name-node
> Affects Versions: 0.24.0, 0.23.3
> Reporter: Suresh Srinivas
> Assignee: Suresh Srinivas
> Attachments: MultipleSharedJournals.pdf, MultipleSharedJournals.pdf,
> MultipleSharedJournals.pdf
>
>
> Currently standby namenode relies on reading shared editlogs to stay current
> with the active namenode, for namespace changes. BackupNode used streaming
> edits from active namenode for doing the same. This jira is to explore using
> journal protocol based editlog streams for the standby namenode. A daemon in
> standby will get the editlogs from the active and write it to local edits. To
> begin with, the existing standby mechanism of reading from a file, will
> continue to be used, instead of from shared edits, from the local edits.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira