[
https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470572#comment-13470572
]
Sanjay Radia commented on HDFS-3077:
------------------------------------
Suresh and I have been looking at the design and compared it to Paxos and Zab
in detail and have concluded that the design is closer to ZAB than Paxos.
* In both cases the recovery establishes a leader and syncs missing
transactions across a number of journal-participants. At the end the leader
writes future transactions to the journal-participants.
* The txid is used in both cases (called zxid in ZAB) in similar ways except in
ZAB the epoch is part of the transaction id.
* The recovery process discovers the highest txid, and then arranges to sync
the missing transactions across the participant journals.
* the steps are very similar - except the HDFS-3077 design has an extra
initial step. If newEpoch and prepareRecovery are merged then the HDFS-3077
will become the same as ZAB.
The proposal is to merge the first 2 steps and just model this after ZAB and
use the ZAB terminology. We have discussed some of the implementation details
with Mahadev of the ZK team and can benefit from insights in some of ZK's lower
level details and the corner cases they deal with. There are some details on
what is persisted and when it is persisted that we would like to discuss
further.
> Quorum-based protocol for reading and writing edit logs
> -------------------------------------------------------
>
> Key: HDFS-3077
> URL: https://issues.apache.org/jira/browse/HDFS-3077
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: ha, name-node
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Fix For: QuorumJournalManager (HDFS-3077)
>
> Attachments: hdfs-3077-partial.txt, hdfs-3077-test-merge.txt,
> hdfs-3077.txt, hdfs-3077.txt, hdfs-3077.txt, hdfs-3077.txt, hdfs-3077.txt,
> hdfs-3077.txt, hdfs-3077.txt, qjournal-design.pdf, qjournal-design.pdf,
> qjournal-design.pdf, qjournal-design.pdf, qjournal-design.pdf,
> qjournal-design.pdf, qjournal-design.tex, qjournal-design.tex
>
>
> Currently, one of the weak points of the HA design is that it relies on
> shared storage such as an NFS filer for the shared edit log. One alternative
> that has been proposed is to depend on BookKeeper, a ZooKeeper subproject
> which provides a highly available replicated edit log on commodity hardware.
> This JIRA is to implement another alternative, based on a quorum commit
> protocol, integrated more tightly in HDFS and with the requirements driven
> only by HDFS's needs rather than more generic use cases. More details to
> follow.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira