[
https://issues.apache.org/jira/browse/CURATOR-325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16178253#comment-16178253
]
Su Robi edited comment on CURATOR-325 at 9/24/17 3:37 PM:
----------------------------------------------------------
[~randgalt] [~wlongdu]
Hi, I seem meet a similar problem..
After div into code, I found this problem is caused by read data(getData or
getChild) with `RetryForever like` retry policy in our custom watcher
implements.
As result, when session closed, EventThread maybe fall into retry infinite
loop in custom watcher, and no any chance to give curator's watcher ---
`ConnectionState#process` to handleExpiredSession and make `ClientCnxn#state`
alive again(which is needed to break infinite loop).
This problem can be solve if we don't modify zookeeper/curator:
- not use forever retry policy..and infinite loop for "a while" - -
- or like `PathCache` does, send task to another thread after receive
WatchedEvent
but I think it seems a hole that user defined watcher may block framework
watcher, but framework watcher is vital to user's watcher finish work..
Is any ideal curator can do to improve this problem ^ ^?
was (Author: robiplus):
[~randgalt] [~wlongdu]
Hi, I seem meet a similar problem..
After div into code, I found this problem is caused by read data(getData or
getChild) with `RetryForever like` retry policy in our custom watcher
implements.
As result, when session closed, EventThread maybe fall into retry infinite
loop in custom watcher, and no any chance to give curator's watcher ---
`ConnectionState#process` to handleExpiredSession and make `ClientCnxn#state`
alive again(which is needed to break infinite loop).
This problem can be solve if we don't modify zookeeper/curator:
- not use forever retry policy..and infinite loop for "a while" - -
- or like `PathCache` does, send task to another thread after receive
WatchedEvent
but I think it seems a hole that user defined watcher may block framework
watcher, but framework watcher is vital to user's watcher..
Is any ideal curator can do to improve this problem ^ ^?
> Background retry falls into infinite loop of SessionExpiredException
> --------------------------------------------------------------------
>
> Key: CURATOR-325
> URL: https://issues.apache.org/jira/browse/CURATOR-325
> Project: Apache Curator
> Issue Type: Bug
> Components: Client
> Affects Versions: 2.9.1, 2.10.0
> Environment: sun java jdk 1.7.0_55, curator 2.9.1, zookeeper 3.4.6
> Reporter: clive du
> Labels: SessionExpiredException, loop
>
> after long time gc pause,which longer than zookeeper session time,the
> zookeeper cluster invalidate the session id holding by the client and waiting
> the client to reconnect,but client consider the SessionExpiredException as
> retry exception and re-put to the background queue,so wo get the stacktrace
> infinitely.
> 12:50:54.337 [configuration-0-EventThread] DEBUG org.apache.curator.RetryLoop
> - Retrying operation
> 12:50:54.337 [configuration-0-EventThread] DEBUG org.apache.curator.RetryLoop
> - Retry-able exception received
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode
> = Session expired for /dynamic/apps/258741001/DEV
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
> ~[zookeeper-3.4.6.jar:3.4.6-1569965]
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> ~[zookeeper-3.4.6.jar:3.4.6-1569965]
> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
> ~[zookeeper-3.4.6.jar:3.4.6-1569965]
> at
> org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:304)
> ~[curator-framework-2.10.0.jar:na]
> at
> org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:293)
> ~[curator-framework-2.10.0.jar:na]
> at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108)
> ~[curator-client-2.10.0.jar:na]
> at
> org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:290)
> [curator-framework-2.10.0.jar:na]
> at
> org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:281)
> [curator-framework-2.10.0.jar:na]
> at
> org.apache.curator.framework.imps.GetDataBuilderImpl$1.forPath(GetDataBuilderImpl.java:105)
> [curator-framework-2.10.0.jar:na]
> at
> org.apache.curator.framework.imps.GetDataBuilderImpl$1.forPath(GetDataBuilderImpl.java:65)
> [curator-framework-2.10.0.jar:na]
> at
> com.ctrip.flight.configuration.client.AbstractZookeeperClient.getData(AbstractZookeeperClient.java:68)
> [classes/:na]
> at
> com.ctrip.flight.configuration.client.ZooKeeperConfigurationSource.getPublishNodeValue(ZooKeeperConfigurationSource.java:258)
> [classes/:na]
> at
> com.ctrip.flight.configuration.client.ZooKeeperConfigurationSource.access$100(ZooKeeperConfigurationSource.java:45)
> [classes/:na]
> at
> com.ctrip.flight.configuration.client.ZooKeeperConfigurationSource$1.nodeChanged(ZooKeeperConfigurationSource.java:105)
> [classes/:na]
> at
> org.apache.curator.framework.recipes.cache.NodeCache$4.apply(NodeCache.java:310)
> [curator-recipes-2.10.0.jar:na]
> at
> org.apache.curator.framework.recipes.cache.NodeCache$4.apply(NodeCache.java:304)
> [curator-recipes-2.10.0.jar:na]
> at
> org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:93)
> [curator-framework-2.10.0.jar:na]
> at
> com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:310)
> [guava-19.0.jar:na]
> at
> org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:85)
> [curator-framework-2.10.0.jar:na]
> at
> org.apache.curator.framework.recipes.cache.NodeCache.setNewData(NodeCache.java:302)
> [curator-recipes-2.10.0.jar:na]
> at
> org.apache.curator.framework.recipes.cache.NodeCache.processBackgroundResult(NodeCache.java:269)
> [curator-recipes-2.10.0.jar:na]
> at
> org.apache.curator.framework.recipes.cache.NodeCache.access$300(NodeCache.java:56)
> [curator-recipes-2.10.0.jar:na]
> at
> org.apache.curator.framework.recipes.cache.NodeCache$3.processResult(NodeCache.java:122)
> [curator-recipes-2.10.0.jar:na]
> at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:749)
> [curator-framework-2.10.0.jar:na]
> at
> org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:522)
> [curator-framework-2.10.0.jar:na]
> at
> org.apache.curator.framework.imps.GetDataBuilderImpl$3.processResult(GetDataBuilderImpl.java:256)
> [curator-framework-2.10.0.jar:na]
> at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:561)
> [zookeeper-3.4.6.jar:3.4.6-1569965]
> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> [zookeeper-3.4.6.jar:3.4.6-1569965]
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)