poorbarcode opened a new pull request, #16841: URL: https://github.com/apache/pulsar/pull/16841
### Motivation If the meta-ledger fails to be initialized when mark delete is executed, a timeout response of the 20s will occur in 1/1000 probability. You can reproduce it by doing this: "Run unit test `ManagedCursorTest.markDeleteWithErrors` 1000 times". When the problem occurs, the actual execution process is as follows: | Time | `cursor mark deleted` | `meta thread` | | ----------- | ----------- | ----------- | | 1 | check meta-ledger state | | | 2 | do create ledger | | | 3 | | create ledger fail | | 4 | | loop pending requests, and fail callback | | 5 | append to pending requests queue | | | 6 | waiting callback... | | | 7 | after the 20s... | | | 8 | timeout ex | | - Each column means the individual threads. - Column Time is used only to indicate the order of each step, not the actual time. - The important steps are explained below: step-4: If the ledger fails to be created, will trigger a "fail back" for the pending requests, and the requests that have not been queued will be ignored. https://github.com/apache/pulsar/blob/c217b8f559292fd34c6a4fb4b30aab213720d962/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java#L2570-L2577 step-5 ( <strong>High light</strong> ): If the meta ledger needs to be created, create ledger will be triggered first and the current request will be put into the `pending requests queue`. It is possible that step 4 has been completed before the request is put into the queue, so this request will not get the callback anymore. https://github.com/apache/pulsar/blob/c217b8f559292fd34c6a4fb4b30aab213720d962/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java#L1870-L1876 ### Modifications When should create a ledger, make `append to pending requests queue` and `create ledger` execute serially. ### Documentation Check the box below or label this PR directly. Need to update docs? - [ ] `doc-required` (Your PR needs to update docs and you will update later) - [x] `doc-not-needed` (Please explain why) - [ ] `doc` (Your PR contains doc changes) - [ ] `doc-complete` (Docs have been already added) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
