xiang092689 commented on issue #17812: URL: https://github.com/apache/pulsar/issues/17812#issuecomment-1647335052
We also encountered this problem in version 2.11.0. (w:2,r:2,a:1) the scene is we reboot a machine which lives 1 bookie and 1 broker after rebalance, new topic owner open metadata ledger 1. cursor recover failed by LedgerRecoveryException bk client open metadata ledger failed by (-8,0) then return LedgerRecoveryException (-10) pulsar catch the exception and initialize cursor with earliest position recorded in zookeeper and set cursor stat as NoLedger the cursor will close because recover failed when cursor close, broker will persist md position to zk if cursor stat is not closed or closing. however, there is an additional action, because open metadata cursor failed, cursor initialize cursorledger as null, cursor ledger in zookeeper will set as -1 at the same time. 2. reset cursor to the earliest let's go to next recover round. broker will create a new cursor ledger with the earliest position which is persisted when cursor close in step 1. here is the whole story i think broker process is fine, but i think there should be some tolerance when we meet LedgerRecoveryException. bookeeper client return LedgerRecoveryException when the rc is not "timeout" and "authenticate failed" which covers too much exception and make the problem reproduce easier. Actually, i don't know how to fix it properly -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
