yapxue opened a new issue, #15782:
URL: https://github.com/apache/pulsar/issues/15782

   **Describe the bug**
   When a broker is shutdowm suddenly, for example it holds ledger L1 and LAC 
is E0, has pending writes E1 and E2, WQ is B1, B2, B3. E1 has been written to 
B1, E2 has been written to B3. Then before bookkeeper client opens a new ledger 
and it is supposed to try to recover L1. There maybe some problem for current 
recovery policy.  Recovery has two principles.
   principle 1:     One positive bookie will be judged as recoverable. 
   principle 2:    (WQ-AQ)+1 negative bookie will be judged as unrecoverable. 
   positive bookie mean it has the entry, negative bookie mean it will respond 
NoSuchEntry when you query the entry.
   For WQ=3 and only one bookie has the entry, it will be judged as recoverable 
if its response arrives earlier than the other two. It will be judged as 
unrecoverable and truncated if its response arrives later than the pther two. 
For the same condition is has two behavior.
   
   Here is an example. when a broker is shutdowm suddenly, it holds ledger L1 
and LAC is E0, has pending writes E1 and E2, WQ is B1, B2, B3. E1 has been 
written to B1, E2 has been written to B3. Then before bookkeeper client opens a 
new ledger and it is supposed to try to recover L1. Then it may have the 
following two cases.
   
   case 1:
                         B1                            B2.                      
           B3                                  is recoverable?
   t1.                  OK                           Waiting response.        
Waiting response          Yes
   
   case 2:
                         B1                            B2.                      
           B3                                  is recoverable?
   t1.                 NoSuchEntry           Waiting response.        Waiting 
response         Need more response
   t2.                                                  NoSuchEntry.            
   Waiting response         No.
   
   case 1 and case 2 has different results. For case 1, client think the entry 
is not written success but bookie has the entry, this can cause inconsitency of 
clien and server.
   
   
   **Expected behavior**
   For the same condition, recovery should have the same results, not probably 
can recover, probably cann't recover.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to