[ 
https://issues.apache.org/jira/browse/CURATOR-325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16178253#comment-16178253
 ] 

Su Robi edited comment on CURATOR-325 at 9/24/17 3:37 PM:
----------------------------------------------------------

[~randgalt] [~wlongdu]

Hi, I seem meet a similar problem..

After div into code, I found this problem is caused by read data(getData or 
getChild) with `RetryForever like` retry policy in our custom watcher 
implements.

As result,  when session closed, EventThread maybe fall into retry infinite 
loop in custom watcher, and no any chance to give curator's watcher --- 
`ConnectionState#process` to handleExpiredSession and make `ClientCnxn#state` 
alive again(which is needed to break infinite loop).

This problem can be solve if  we don't modify zookeeper/curator:

- not use forever retry policy..and infinite loop for "a while" - -
- or like `PathCache` does, send task to another thread after receive 
WatchedEvent

but I think it seems a hole that user defined watcher may block framework 
watcher, but framework watcher  is vital to user's watcher finish work..

Is any ideal curator can do to improve this problem ^ ^?



was (Author: robiplus):
[~randgalt] [~wlongdu]

Hi, I seem meet a similar problem..

After div into code, I found this problem is caused by read data(getData or 
getChild) with `RetryForever like` retry policy in our custom watcher 
implements.

As result,  when session closed, EventThread maybe fall into retry infinite 
loop in custom watcher, and no any chance to give curator's watcher --- 
`ConnectionState#process` to handleExpiredSession and make `ClientCnxn#state` 
alive again(which is needed to break infinite loop).

This problem can be solve if  we don't modify zookeeper/curator:

- not use forever retry policy..and infinite loop for "a while" - -
- or like `PathCache` does, send task to another thread after receive 
WatchedEvent

but I think it seems a hole that user defined watcher may block framework 
watcher, but framework watcher  is vital to user's watcher..

Is any ideal curator can do to improve this problem ^ ^?


> Background retry falls into infinite loop of SessionExpiredException
> --------------------------------------------------------------------
>
>                 Key: CURATOR-325
>                 URL: https://issues.apache.org/jira/browse/CURATOR-325
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 2.9.1, 2.10.0
>         Environment: sun java jdk 1.7.0_55, curator 2.9.1, zookeeper 3.4.6
>            Reporter: clive du
>              Labels: SessionExpiredException, loop
>
> after long time gc pause,which longer than zookeeper session time,the 
> zookeeper cluster invalidate the session id holding by the client and waiting 
> the client to reconnect,but client consider the  SessionExpiredException as 
> retry exception and re-put to the background queue,so wo get the stacktrace 
> infinitely.
> 12:50:54.337 [configuration-0-EventThread] DEBUG org.apache.curator.RetryLoop 
> - Retrying operation
> 12:50:54.337 [configuration-0-EventThread] DEBUG org.apache.curator.RetryLoop 
> - Retry-able exception received
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
> = Session expired for /dynamic/apps/258741001/DEV
>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) 
> ~[zookeeper-3.4.6.jar:3.4.6-1569965]
>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) 
> ~[zookeeper-3.4.6.jar:3.4.6-1569965]
>     at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155) 
> ~[zookeeper-3.4.6.jar:3.4.6-1569965]
>     at 
> org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:304)
>  ~[curator-framework-2.10.0.jar:na]
>     at 
> org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:293)
>  ~[curator-framework-2.10.0.jar:na]
>     at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108) 
> ~[curator-client-2.10.0.jar:na]
>     at 
> org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:290)
>  [curator-framework-2.10.0.jar:na]
>     at 
> org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:281)
>  [curator-framework-2.10.0.jar:na]
>     at 
> org.apache.curator.framework.imps.GetDataBuilderImpl$1.forPath(GetDataBuilderImpl.java:105)
>  [curator-framework-2.10.0.jar:na]
>     at 
> org.apache.curator.framework.imps.GetDataBuilderImpl$1.forPath(GetDataBuilderImpl.java:65)
>  [curator-framework-2.10.0.jar:na]
>     at 
> com.ctrip.flight.configuration.client.AbstractZookeeperClient.getData(AbstractZookeeperClient.java:68)
>  [classes/:na]
>     at 
> com.ctrip.flight.configuration.client.ZooKeeperConfigurationSource.getPublishNodeValue(ZooKeeperConfigurationSource.java:258)
>  [classes/:na]
>     at 
> com.ctrip.flight.configuration.client.ZooKeeperConfigurationSource.access$100(ZooKeeperConfigurationSource.java:45)
>  [classes/:na]
>     at 
> com.ctrip.flight.configuration.client.ZooKeeperConfigurationSource$1.nodeChanged(ZooKeeperConfigurationSource.java:105)
>  [classes/:na]
>     at 
> org.apache.curator.framework.recipes.cache.NodeCache$4.apply(NodeCache.java:310)
>  [curator-recipes-2.10.0.jar:na]
>     at 
> org.apache.curator.framework.recipes.cache.NodeCache$4.apply(NodeCache.java:304)
>  [curator-recipes-2.10.0.jar:na]
>     at 
> org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:93)
>  [curator-framework-2.10.0.jar:na]
>     at 
> com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:310)
>  [guava-19.0.jar:na]
>     at 
> org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:85)
>  [curator-framework-2.10.0.jar:na]
>     at 
> org.apache.curator.framework.recipes.cache.NodeCache.setNewData(NodeCache.java:302)
>  [curator-recipes-2.10.0.jar:na]
>     at 
> org.apache.curator.framework.recipes.cache.NodeCache.processBackgroundResult(NodeCache.java:269)
>  [curator-recipes-2.10.0.jar:na]
>     at 
> org.apache.curator.framework.recipes.cache.NodeCache.access$300(NodeCache.java:56)
>  [curator-recipes-2.10.0.jar:na]
>     at 
> org.apache.curator.framework.recipes.cache.NodeCache$3.processResult(NodeCache.java:122)
>  [curator-recipes-2.10.0.jar:na]
>     at 
> org.apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:749)
>  [curator-framework-2.10.0.jar:na]
>     at 
> org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:522)
>  [curator-framework-2.10.0.jar:na]
>     at 
> org.apache.curator.framework.imps.GetDataBuilderImpl$3.processResult(GetDataBuilderImpl.java:256)
>  [curator-framework-2.10.0.jar:na]
>     at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:561) 
> [zookeeper-3.4.6.jar:3.4.6-1569965]
>     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) 
> [zookeeper-3.4.6.jar:3.4.6-1569965]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to