[
https://issues.apache.org/jira/browse/BOOKKEEPER-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13234262#comment-13234262
]
Ivan Kelly commented on BOOKKEEPER-112:
---------------------------------------
Flavio and I discussed this a little yesterday evening and after thinking about
it for a little bit afterwards, the problem seems clearer to me now.
So, what is under discussion here is what do we do with the final fragment of
an open ledger. This actually boils down to the same problem we have for
fencing. By recovering the bookie, we are introducing a second writer,
violating our 1-writer assumption. Since we now have more than one writer, it
is necessary for there to be a consensus among all writers on where the ledger
fragment ends.
There are 3 situations which this open final fragment can occur.
# the original writer crashed, then bookie crashed before ledger recovery
# the original writer has the ledger open, but has not written anything since
the bookie crashed
# the bookie being recovered isn't actually down
One solution proposed by Flavio yesterday was that we should wait until no open
final fragments exist before updating the ZK metadata. This works for 2.
However, for 1 & 3, the recovery will wait forever.
One way im leaning towards now, is to replicate all entries in the fragment,
and then ensure that no more entries are added to this specific fragment. This
would require a change to how fencing works. Instead of fencing by ledger id,
we would have to fence by fragment id. When the original writer tries to write,
the write will fail, and then try to replace the bookie to which the write
failed to write to (all bookies in this case). This deals with 1, because all
entries written before the writer crash will be replicated. It works for 2,
because the next write by the writer will see that its current ledger fragment
is fenced *and* that the crashed bookie is down, so it will build a new
ensemble and start writing a new fragment. It deals with 3, as the current
fragment will be rereplicated and any further attempts by the writer will force
it to rebuild its ensemble.
> Bookie Recovery on an open ledger will cause LedgerHandle#close on that
> ledger to fail
> --------------------------------------------------------------------------------------
>
> Key: BOOKKEEPER-112
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-112
> Project: Bookkeeper
> Issue Type: Bug
> Reporter: Flavio Junqueira
> Assignee: Sijie Guo
> Fix For: 4.1.0
>
> Attachments: BK-112.patch, BOOKKEEPER-112.patch,
> BOOKKEEPER-112.patch_v2, BOOKKEEPER-112.patch_v3, BOOKKEEPER-112.patch_v4,
> BOOKKEEPER-112.patch_v5
>
>
> Bookie recovery updates the ledger metadata in zookeeper. LedgerHandle will
> not get notified of this update, so it will try to write out its own ledger
> metadata, only to fail with KeeperException.BadVersion. This effectively
> fences all write operations on the LedgerHandle (close and addEntry). close
> will fail for obvious reasons. addEntry will fail once it gets to the failed
> bookie in the schedule, tries to write, fails, selects a new bookie and tries
> to update ledger metadata.
> Update Line 605, testSyncBookieRecoveryToRandomBookiesCheckForDupes(), when
> done
> Also, uncomment addEntry in
> TestFencing#testFencingInteractionWithBookieRecovery()
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira