kezhuw commented on issue #7517: URL: https://github.com/apache/pulsar/issues/7517#issuecomment-663863578
@sijie Sorry for the delay. I am willing to take over this issue. I have add new test case [`testAcquireOwnershipWithZookeeperDisconnectedAfterOwnershipNodeCreated`](https://github.com/kezhuw/pulsar/commit/bbe9bdd2244ca051c6fe4efb90aad66b5d079375#diff-29ccb5c3ba685ffcabe9df5e9fd7e841R193) which also fails due to zookeeper disconnected. Before formal pull request, I think we should converge on how to fix this issue to avoid substantial divergence. There are two possible approaches to fix or reduce possibility of this issue in my opinion: 1. Retry on certain errors till success or session expired. 2. Reestablish existing ownership in ownership querying and acquiring. I think first approach can't or hard to provide correctness due to reasons: * It is hard to take appropriate actions for all error conditions. * It can't handle disconnected-connected-disconnected-... dance. But I think retry approach indeed provides api usability and caller friendliness. In contrast, the second approach admits that we could not provide correct result in certain condition, but we can provide correct result with manually retry after that possibly temporary condition solved later. So, I tend to fix this issue by reestablish existing ownership in later ownership querying and acquiring. In future, we can retry on failure automatically to improve caller friendliness without sacrifice correctness. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
