kezhuw commented on issue #7517:
URL: https://github.com/apache/pulsar/issues/7517#issuecomment-663863578


   @sijie Sorry for the delay. I am willing to take over this issue. I have add 
new test case 
[`testAcquireOwnershipWithZookeeperDisconnectedAfterOwnershipNodeCreated`](https://github.com/kezhuw/pulsar/commit/bbe9bdd2244ca051c6fe4efb90aad66b5d079375#diff-29ccb5c3ba685ffcabe9df5e9fd7e841R193)
 which also fails due to zookeeper disconnected.
   
   Before formal pull request, I think we should converge on how to fix this 
issue to avoid substantial divergence.
   
   There are two possible approaches to fix or reduce possibility of this issue 
in my opinion:
   1. Retry on certain errors till success or session expired.
   2. Reestablish existing ownership in ownership querying and acquiring.
   
   I think first approach can't or hard to provide correctness due to reasons:
   * It is hard to take appropriate actions for all error conditions.
   * It can't handle disconnected-connected-disconnected-... dance.
   But I think retry approach indeed provides api usability and caller 
friendliness.
   
   In contrast, the second approach admits that we could not provide correct 
result in certain condition, but we can provide correct result with manually 
retry after that possibly temporary condition solved later.
   
   So, I tend to fix this issue by reestablish existing ownership in later 
ownership querying and acquiring. In future, we can retry on failure 
automatically to improve caller friendliness without sacrifice correctness.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to