[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13135611#comment-13135611
 ] 

Sijie Guo commented on BOOKKEEPER-93:
-------------------------------------

Ivan,

> 2) This is unrelated to 1) so should be in a separate JIRA. Also, im unsure 
> the race you describe can occur. ReadLastConfirmedOp#readEntryComplete is 
> already synchronized.

You are right. readEntryComplete is synchronized, no race condition on it.

the issue is that readLastConfirmedComplete will be triggered twice.

{code:title=ReadLastConfirmedOp.java|borderStyle=solid}
        // other return codes dont count as valid responses
        if ((validResponses >= lh.metadata.quorumSize) &&
                notComplete) {
            notComplete = false;
            if (LOG.isDebugEnabled()) {
                LOG.debug("Read Complete with enough validResponses");
            }
            cb.readLastConfirmedComplete(BKException.Code.OK, maxAddConfirmed, 
this.ctx);
            return;
        }

        if (numResponsesPending == 0) {
            // Have got all responses back but was still not enough, just fail 
the operation
            LOG.error("While readLastConfirmed ledger: " + ledgerId + " did not 
hear success responses from all quorums");
            
cb.readLastConfirmedComplete(BKException.Code.LedgerRecoveryException, 
maxAddConfirmed, this.ctx);
        }
{code}

The last one will trigger readLastConfirmedComplete no matter there is enough 
valid responses or not.

{quote}
2011-10-26 09:34:48,874 - DEBUG - [pool-174-thread-1:ReadLastConfirmedOp@90] - 
Read Complete with enough validResponses
2011-10-26 09:34:48,874 - ERROR - [pool-174-thread-1:ReadLastConfirmedOp@97] - 
While readLastConfirmed ledger: 1 did not hear success responses from
{quote}
                
> bookkeeper doesn't work correctly on OpenLedgerNoRecovery
> ---------------------------------------------------------
>
>                 Key: BOOKKEEPER-93
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-93
>             Project: Bookkeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.0
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 3.4.0
>
>         Attachments: bookkeeper-93.patch
>
>
> 1) bookkeeper hang when openLedgerNoRecovery, since LedgerOpenOp didn't 
> trigger callback when opening ledger no recovery.
> 2) race condition in ReadLastConfirmOp
> ReadLastConfirmOp callback on readEntryComplete.
> a) first decrement numResponsePending
> b) then increment validResponses
> c) check validResponses to callback with OK
> b) check numResponsePending to callback with LedgerRecoveryException
> support two callbacks returns on readEntryComplete: A, B. (quorum/ensemble 
> size : 2)
> a) A first decrement numResponsePending from 2 to 1.
> b) A increment validResponses from 0 to 1.
> c) B then decrement numResponsePending from 1 to 0.
> d) A check numResponsePending before B check validResponse, A found the 
> numResponsePending is 0 now. A will callback with exception. But the right 
> action is B check validResponse and callback with OK.
> 3) if an LegerHandle is opened by openLedgerNoRecovery, the lastAddConfirmed 
> will be set to -1. so all read requests will be failed since readEntry id > 
> lastAddConfirmed.
> so I suggested that if an LegerHandle is opened by openLegerNoRecovery, the 
> ledgerHandle is under unsafeRead mode. close/write operations will be failed, 
> read operations should not check condition entry_id > lastAddConfirmed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to