[ 
https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13472099#comment-13472099
 ] 

Sanjay Radia commented on HDFS-3077:
------------------------------------

bq. Currently, we only run recovery on the highest txid segment at startup. 
This means that every segment is stored on at least a quorum of nodes. But it 
does not mean that previous segments get replicated to all available nodes.
This wasn't obvious from HDFS-3077 document and a limitation of HDFS-3077; 
don't you agree? Segment holes is operationally messy when manual recovery is 
necessary in the field.
Do the following two suggestions make sense?
# When a JN joins, it must have sync'ed all previous segments before accepting 
new writes.
# At recovery, sync missing segments (due to 1, a JN may miss several segments 
but the set of segments is all at the end - there cannot be holes.)


bq. If we wanted to improve this[deal with missing segments], however, ... If 
we merged NewEpoch and PrepareRecovery, that wouldn't be possible.
Todd the way segments are playing out in our protocol is scaring me; 
Zookeeper's ZAB avoids all this - they recover all previous transactions. It 
seems that segments have complicated our protocol significantly.
With the additional subtleties you have pointed out I am worried only a few 
will be able to maintain this code.



                
> Quorum-based protocol for reading and writing edit logs
> -------------------------------------------------------
>
>                 Key: HDFS-3077
>                 URL: https://issues.apache.org/jira/browse/HDFS-3077
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: ha, name-node
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: QuorumJournalManager (HDFS-3077)
>
>         Attachments: hdfs-3077-partial.txt, hdfs-3077-test-merge.txt, 
> hdfs-3077.txt, hdfs-3077.txt, hdfs-3077.txt, hdfs-3077.txt, hdfs-3077.txt, 
> hdfs-3077.txt, hdfs-3077.txt, qjournal-design.pdf, qjournal-design.pdf, 
> qjournal-design.pdf, qjournal-design.pdf, qjournal-design.pdf, 
> qjournal-design.pdf, qjournal-design.tex, qjournal-design.tex
>
>
> Currently, one of the weak points of the HA design is that it relies on 
> shared storage such as an NFS filer for the shared edit log. One alternative 
> that has been proposed is to depend on BookKeeper, a ZooKeeper subproject 
> which provides a highly available replicated edit log on commodity hardware. 
> This JIRA is to implement another alternative, based on a quorum commit 
> protocol, integrated more tightly in HDFS and with the requirements driven 
> only by HDFS's needs rather than more generic use cases. More details to 
> follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to