sijie commented on issue #3279: Error while recovering ledger when send messages by producer URL: https://github.com/apache/pulsar/issues/3279#issuecomment-450814189 I have worked with @codelipenghui on debugging this issue. When the issue happened, following logging statements were found at broker. ``` 15:10:00.028 [BookKeeperClientWorker-OrderedExecutor-22-0] WARN org.apache.bookkeeper.client.LedgerHandle - Conditional update ledger metadata for ledger 153718 failed. 15:10:00.029 [BookKeeperClientWorker-OrderedExecutor-22-0] WARN org.apache.bookkeeper.client.LedgerRecoveryOp - Close ledger 153718 failed during recovery: ``` So the `LedgerRecoveryException` received at producer side is coming from broker failing on updating ledger metadata when loading a topic - *Conditional update ledger metadata for ledger 153718 failed`. **Why this happened** The *Conditional update failure* can happen when the ownership of a topic is transferred from one broker to the other broker. The transfer can be triggered by any events, for example topic reassigned when network is partitioned, load balancing and such. During this period, old owner is unloading the topic and closing the last ledger in the topic. Closing the ledger involves updating ledger metadata. The new owner is loading the topic, and recovering the last ledger in the topic. At the end of recovery, it will also close the ledger (which also updates ledger metadata). Concurrent metadata updates will trigger this "conditional update" failure. One update will succeed and the other update will fail. And broker and client don't have retry logics for this case, so the exception is popped all the way back to applications. This issue will be fixed by bk 4.9.0 release, since bk 4.9.0 will handle conditional update failure on closing and will not throw exception. (/cc @ivankelly for confirmation) However I think at Pulsar side, there are a few improvements can be considered. for example, at producer side, producer can potentially catch this exception and determine whether it should retry on this exception or not. If it can retry, the producer can retry it before popping the exception to applications.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
