[ https://issues.apache.org/jira/browse/HELIX-748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16558786#comment-16558786 ]
Jiajun Wang commented on HELIX-748: ----------------------------------- Change to something like this: public <T> T retryUntilConnected(final Callable<T> callable) throws IllegalArgumentException, ZkException { if (_zookeeperEventThread != null && Thread.currentThread() == _zookeeperEventThread) { throw new IllegalArgumentException("Must not be done in the zookeeper event thread."); } final long operationStartTime = System.currentTimeMillis(); while (true) { if (_closed) { throw new IllegalStateException("ZkClient already closed!"); } try { final ZkConnection zkConnection = (ZkConnection) getConnection(); // Validate that the connection is not null before trigger callback if (zkConnection == null || zkConnection.getZookeeper() == null) { LOG.debug( "ZkConnection is in invalid state! Retry until timeout or ZkClient closed."); } else { return callable.call(); } } catch (InterruptedException e) { throw new ZkInterruptedException(e); } catch (Exception e) { // we give the ZkClient some time to fix the connection issue. Thread.yield(); waitForRetry(); } // before attempting a retry, check whether retry timeout has elapsed if (System.currentTimeMillis() - operationStartTime > _operationRetryTimeoutInMillis) { throw new ZkTimeoutException( "Operation cannot be retried because of retry timeout (" + _operationRetryTimeoutInMillis + " milli seconds)"); } } } Need to validate if any corner cases and adding test cases. > ZkClient should not throw Exception when internal ZkConnection is reset > ----------------------------------------------------------------------- > > Key: HELIX-748 > URL: https://issues.apache.org/jira/browse/HELIX-748 > Project: Apache Helix > Issue Type: Task > Reporter: Jiajun Wang > Assignee: Jiajun Wang > Priority: Major > > It is noticed that ZkClient throws an exception because of ZkConnection == > null when it is reset. > This could be caused by an expiring session handling. According to the > design, ZkClient operation should wait until reset done, instead of break the > retry. -- This message was sent by Atlassian JIRA (v7.6.3#76005)