sijie commented on issue #3279: Error while recovering ledger when send 
messages by producer
URL: https://github.com/apache/pulsar/issues/3279#issuecomment-450814189
 
 
   I have worked with @codelipenghui on debugging this issue. When the issue 
happened, following logging statements were found at broker.
   
   ```
   15:10:00.028 [BookKeeperClientWorker-OrderedExecutor-22-0] WARN  
org.apache.bookkeeper.client.LedgerHandle - Conditional update ledger metadata 
for ledger 153718 failed.
   15:10:00.029 [BookKeeperClientWorker-OrderedExecutor-22-0] WARN  
org.apache.bookkeeper.client.LedgerRecoveryOp - Close ledger 153718 failed 
during recovery:
   ```
   
   So the `LedgerRecoveryException` received at producer side is coming from 
broker failing on updating ledger metadata when loading a topic - *Conditional 
update ledger metadata for ledger 153718 failed`.
   
   **Why this happened**
   
   The *Conditional update failure* can happen when the ownership of a topic is 
transferred from one broker to the other broker. The transfer can be triggered 
by any events, for example topic reassigned when network is partitioned, load 
balancing and such. 
   
   During this period, old owner is unloading the topic and closing the last 
ledger in the topic. Closing the ledger involves updating ledger metadata. The 
new owner is loading the topic, and recovering the last ledger in the topic. At 
the end of recovery, it will also close the ledger (which also updates ledger 
metadata). Concurrent metadata updates will trigger this "conditional update" 
failure. One update will succeed and the other update will fail. And broker and 
client don't have retry logics for this case, so the exception is popped all 
the way back to applications.
   
   This issue will be fixed by bk 4.9.0 release, since bk 4.9.0 will handle 
conditional update failure on closing and will not throw exception. (/cc 
@ivankelly for confirmation)
   
   However I think at Pulsar side, there are a few improvements can be 
considered.
   
   for example, at producer side, producer can potentially catch this exception 
and determine whether it should retry on this exception or not. If it can 
retry, the producer can retry it before popping the exception to applications.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to