[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150647#comment-13150647
 ] 

Ivan Kelly commented on BOOKKEEPER-101:
---------------------------------------

{quote}
1. There is an api openLendger() that will trigger recovery if needed and open 
the ledger for readin. There is another api openLedgerNoRecovery() which will 
not trigger any recovery. Does it make sense to swap the semantics of these two 
calls? Intuitively, it makes more sense that a openLedger() call is kinda a 
non-destructive & idempotent call and will not trigger any state change on the 
servers. But a intelligent client (e.g. namenode) can invoke 
openLendgerWithRecovery() call to fence off ios from the original writer and 
make the replicas in sync.
{quote}
Naming the non-recovery case #openLedger will invite people to use the 
non-recovery case normally, which is bad. openLedgerNoRecovery is unsafe, as it 
is possible that two reading clients will read a different sequence of entries 
(one being a prefix of the other), as the ledger isn't closed. This is why we 
put it out of the way. This was discussed a bit on BOOKKEEPER-11.

{quote}
2. Suppose there were three namenodes in the group. The active one is writing 
to a ledger. Suppose the primary namenode goes into a GC pause. Both the two 
standbys invoke openLedgerWithRecovery() on the same ledger. is this usecase 
supported? will both the clients now start to execute the code to recover the 
ledger? The reason I ask this question is because the server does not record 
which client has fenced off Io to the ledger. {quote}
Both will attempt to fence off the ledger. One will fail because once it has 
figured out the last entry, it will try to close the ledger, and see that the 
other has got there before it. The #openLedger call will fail on one and 
succeed on the other. 

In terms of performance, recovering the ledger is not very heavy. We get the 
lastAddConfirmed and read forward until we get to the end. The difference 
between lastAddConfirmed and the end will be less than ensemble size. So the 
number of entries read and replicated will be about 4 or 5. 
 
In the case where both try to recover and one finishes before the other starts, 
the second will see the ledger and closed and open it without error. In this 
case, the namenodes will compete for write access to the BK journal manager. 
HDFS-234 solves this with a distributed lock and a znode called inprogress, 
which records the current ledger.
                
> Add Fencing to Bookkeeper
> -------------------------
>
>                 Key: BOOKKEEPER-101
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-101
>             Project: Bookkeeper
>          Issue Type: New Feature
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>             Fix For: 4.0.0
>
>         Attachments: BOOKKEEPER-101.diff, BOOKKEEPER-101.diff, 
> BOOKKEEPER-101.diff, BOOKKEEPER-101.diff, BOOKKEEPER-101.diff
>
>
> BookKeeper is designed for use as a Write ahead log. In systems with a 
> primary/backup architecture, the primary will write state updates to the WAL. 
> If the primary dies the backup comes online, reads the WAL to get the latest 
> state and starts serving requests. However, if the primary was only 
> partitioned from the network, or stuck in a long GC, a split brain occurs. 
> Both primary and backup can service client requests. 
> Fencing(http://en.wikipedia.org/wiki/Fencing_%28computing%29) ensures that 
> this cannot happen. With fencing, the backup can close the WAL of the 
> primary, and cause any subsequent attempt by the primary to write to the WAL 
> to give an error. 
> We fence a ledger whenever it is opened by another client using 
> BookKeeper#openLedger. BookKeeper#openLedgerNoRecovery will not fence.
> The opening client marks the ledger as fenced in zookeeper, and then sends a 
> readEntry message to a all of bookies with the DO_FENCING flag set. Once at 
> least 1 bookie in each possible quorum of bookies have responded, we can 
> proceed with opening the ledger. Any subsequent attempt to write to the 
> ledger will fail as it will not be able to write to a quorum without one of 
> the bookie in the quorum responding with a ledger fenced error. The client 
> will also be unable to change the quorum without seeing that the ledger has 
> been marked as fenced in zookeeper.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to