[
https://issues.apache.org/jira/browse/BOOKKEEPER-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13135611#comment-13135611
]
Sijie Guo commented on BOOKKEEPER-93:
-------------------------------------
Ivan,
> 2) This is unrelated to 1) so should be in a separate JIRA. Also, im unsure
> the race you describe can occur. ReadLastConfirmedOp#readEntryComplete is
> already synchronized.
You are right. readEntryComplete is synchronized, no race condition on it.
the issue is that readLastConfirmedComplete will be triggered twice.
{code:title=ReadLastConfirmedOp.java|borderStyle=solid}
// other return codes dont count as valid responses
if ((validResponses >= lh.metadata.quorumSize) &&
notComplete) {
notComplete = false;
if (LOG.isDebugEnabled()) {
LOG.debug("Read Complete with enough validResponses");
}
cb.readLastConfirmedComplete(BKException.Code.OK, maxAddConfirmed,
this.ctx);
return;
}
if (numResponsesPending == 0) {
// Have got all responses back but was still not enough, just fail
the operation
LOG.error("While readLastConfirmed ledger: " + ledgerId + " did not
hear success responses from all quorums");
cb.readLastConfirmedComplete(BKException.Code.LedgerRecoveryException,
maxAddConfirmed, this.ctx);
}
{code}
The last one will trigger readLastConfirmedComplete no matter there is enough
valid responses or not.
{quote}
2011-10-26 09:34:48,874 - DEBUG - [pool-174-thread-1:ReadLastConfirmedOp@90] -
Read Complete with enough validResponses
2011-10-26 09:34:48,874 - ERROR - [pool-174-thread-1:ReadLastConfirmedOp@97] -
While readLastConfirmed ledger: 1 did not hear success responses from
{quote}
> bookkeeper doesn't work correctly on OpenLedgerNoRecovery
> ---------------------------------------------------------
>
> Key: BOOKKEEPER-93
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-93
> Project: Bookkeeper
> Issue Type: Bug
> Affects Versions: 3.4.0
> Reporter: Sijie Guo
> Assignee: Sijie Guo
> Fix For: 3.4.0
>
> Attachments: bookkeeper-93.patch
>
>
> 1) bookkeeper hang when openLedgerNoRecovery, since LedgerOpenOp didn't
> trigger callback when opening ledger no recovery.
> 2) race condition in ReadLastConfirmOp
> ReadLastConfirmOp callback on readEntryComplete.
> a) first decrement numResponsePending
> b) then increment validResponses
> c) check validResponses to callback with OK
> b) check numResponsePending to callback with LedgerRecoveryException
> support two callbacks returns on readEntryComplete: A, B. (quorum/ensemble
> size : 2)
> a) A first decrement numResponsePending from 2 to 1.
> b) A increment validResponses from 0 to 1.
> c) B then decrement numResponsePending from 1 to 0.
> d) A check numResponsePending before B check validResponse, A found the
> numResponsePending is 0 now. A will callback with exception. But the right
> action is B check validResponse and callback with OK.
> 3) if an LegerHandle is opened by openLedgerNoRecovery, the lastAddConfirmed
> will be set to -1. so all read requests will be failed since readEntry id >
> lastAddConfirmed.
> so I suggested that if an LegerHandle is opened by openLegerNoRecovery, the
> ledgerHandle is under unsafeRead mode. close/write operations will be failed,
> read operations should not check condition entry_id > lastAddConfirmed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira