[ 
https://issues.apache.org/jira/browse/HELIX-748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16564149#comment-16564149
 ] 

Jiajun Wang commented on HELIX-748:
-----------------------------------

Good point. We shall certainly do that.

Besides this concern, we need to resolve another issue as well. Note in the 
proposed code, we keep retrying on any Exceptions except ZkException or 
InterruptedException. This could be dangerous. If any callback logic throws 
random Exception because of their business logic, the client call will keep 
retrying forever.

So, 2 options:
 # Check all possible Exception thrown by the Zk operation call. Only throwing 
KeeperExceptions so we know when to retry when to stop.
 # Change ZkConnection processing logic to ensure it is never to be null. In 
this case, any exceptions shall be related to business logic. We can safely end 
the retry. To implement this, we can implement an atomic connection swap logic. 
So that the ZkConnection ref is always valid.

Based on our investigation, option 2 seems to be a cleaner design. ZkConnection 
is used everywhere. Any possibility that this ref to be null means more error 
handling work.

> ZkClient should not throw Exception when internal ZkConnection is reset
> -----------------------------------------------------------------------
>
>                 Key: HELIX-748
>                 URL: https://issues.apache.org/jira/browse/HELIX-748
>             Project: Apache Helix
>          Issue Type: Task
>            Reporter: Jiajun Wang
>            Assignee: Jiajun Wang
>            Priority: Major
>
> It is noticed that ZkClient throws an exception because of ZkConnection == 
> null when it is reset.
> This could be caused by an expiring session handling. According to the 
> design, ZkClient operation should wait until reset done, instead of break the 
> retry.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to