[
https://issues.apache.org/jira/browse/BOOKKEEPER-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195727#comment-13195727
]
[email protected] commented on BOOKKEEPER-112:
----------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3472/
-----------------------------------------------------------
(Updated 2012-01-29 09:58:30.513187)
Review request for bookkeeper.
Changes
-------
we need to check the ledger metadata status before proceed recovery action.
for those OPENED ledgers,
1) whose last ensemble contains the failed bookie, we should not proceed
recovery action. since we can't promise last entry to be fully replicated.
(also there may be other side effects)
2) whose last ensemble doesn't contain the failed bookie, it is safe to proceed
recovery action.
for those IN_RECOVERY ledgers, we have to check whether last ensemble contains
the failed bookie. if it is, the recovery tool has to help closing this ledger,
since the normal bookkeeper client may fail to close it. (a corn case: 3
bookies (bk1, bk2, bk3), quorum size 3, ensemble size 3. no entry is written.
bk3 is failed. bk1 and bk2 returns NoEntry, bk3 returns HandleNotAvailable.
ledger can't be closed.)
for 2) case of OPENED ledgers, both PendingAddOp and BookKeeperAdmin needs to
rereadMetadata when encountering BADVERSION and try to resolve such confliction
to avoid #close it.
Summary
-------
Bookie recovery updates the ledger metadata in zookeeper. LedgerHandle will not
get notified of this update, so it will try to write out its own ledger
metadata, only to fail with KeeperException.BadVersion. This effectively fences
all write operations on the LedgerHandle (close and addEntry). close will fail
for obvious reasons. addEntry will fail once it gets to the failed bookie in
the schedule, tries to write, fails, selects a new bookie and tries to update
ledger metadata.
Update Line 605, testSyncBookieRecoveryToRandomBookiesCheckForDupes(), when done
Also, uncomment addEntry in
TestFencing#testFencingInteractionWithBookieRecovery()
This addresses bug BOOKKEEPER-112.
https://issues.apache.org/jira/browse/BOOKKEEPER-112
Diffs (updated)
-----
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/BookKeeper.java
5bb37c3
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/BookKeeperAdmin.java
37623dc
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java
547e240
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerMetadata.java
b403aa1
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerOpenOp.java
56186ab
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerRecoveryOp.java
4625bbb
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/PendingReadOp.java
29070eb
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/ReadLastConfirmedOp.java
43e999d
bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieRecoveryTest.java
8526db5
bookkeeper-server/src/test/java/org/apache/bookkeeper/client/TestFencing.java
015e4e4
bookkeeper-server/src/test/java/org/apache/bookkeeper/test/LedgerOpenTest.java
PRE-CREATION
Diff: https://reviews.apache.org/r/3472/diff
Testing
-------
Thanks,
Sijie
> Bookie Recovery on an open ledger will cause LedgerHandle#close on that
> ledger to fail
> --------------------------------------------------------------------------------------
>
> Key: BOOKKEEPER-112
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-112
> Project: Bookkeeper
> Issue Type: Bug
> Reporter: Flavio Junqueira
> Assignee: Ivan Kelly
> Fix For: 4.1.0
>
> Attachments: BK-112.patch, BOOKKEEPER-112.patch
>
>
> Bookie recovery updates the ledger metadata in zookeeper. LedgerHandle will
> not get notified of this update, so it will try to write out its own ledger
> metadata, only to fail with KeeperException.BadVersion. This effectively
> fences all write operations on the LedgerHandle (close and addEntry). close
> will fail for obvious reasons. addEntry will fail once it gets to the failed
> bookie in the schedule, tries to write, fails, selects a new bookie and tries
> to update ledger metadata.
> Update Line 605, testSyncBookieRecoveryToRandomBookiesCheckForDupes(), when
> done
> Also, uncomment addEntry in
> TestFencing#testFencingInteractionWithBookieRecovery()
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira