[
https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423621#comment-13423621
]
Chao Shi commented on HDFS-3077:
--------------------------------
bq. I feel that we can throw a special kind of fatal exception rather than a
ordinary IOException, if any inconsistent states are found (e.g. a JN's epoch >
QJM's epoch). A fatal exception means that QJM must immediately stop its work.
This may be caused by mis-configuration or software bugs. Because that journal
is so critical to HDFS clusters, we should try the best to detect any possible
mistakes/bugs.
I think it over again today and find my example "JN's epoch > QJM's epoch" may
be wrong, because it is the normal case that an old writer is fenced. When a
InvariantViolatedException is thrown, we expect that someone on call should be
paged and go to check the cluster immediately. So false alarming would be
annoying anyway.
> Quorum-based protocol for reading and writing edit logs
> -------------------------------------------------------
>
> Key: HDFS-3077
> URL: https://issues.apache.org/jira/browse/HDFS-3077
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: ha, name-node
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Fix For: QuorumJournalManager (HDFS-3077)
>
> Attachments: hdfs-3077-partial.txt, hdfs-3077.txt, hdfs-3077.txt,
> hdfs-3077.txt, hdfs-3077.txt, hdfs-3077.txt, hdfs-3077.txt, hdfs-3077.txt,
> qjournal-design.pdf, qjournal-design.pdf
>
>
> Currently, one of the weak points of the HA design is that it relies on
> shared storage such as an NFS filer for the shared edit log. One alternative
> that has been proposed is to depend on BookKeeper, a ZooKeeper subproject
> which provides a highly available replicated edit log on commodity hardware.
> This JIRA is to implement another alternative, based on a quorum commit
> protocol, integrated more tightly in HDFS and with the requirements driven
> only by HDFS's needs rather than more generic use cases. More details to
> follow.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira