FYI - I added a new method to Timing to help with this: forSessionSleep() The new version of KillSession inserts an eventOfDeath directly into the client. So, no, it’s not a real session kill like it used to be. But, now it’s more reliable. Use timing.forSessionSleep() to wait for session timeout.
-JZ > On Feb 7, 2016, at 8:21 PM, Scott Blum <[email protected]> wrote: > > Can you describe the change then? Because kill session doesn't seem to now > ensure that ephemeral nodes bound the the killed session disappear in a > timely manner > > On Feb 7, 2016 8:03 PM, "Jordan Zimmerman" <[email protected] > <mailto:[email protected]>> wrote: >> Are we using a new zookeeper? > > > In Curator 3.0 with “new” connection state handling which simulates a Session > timeout. However, all tests are run twice. First with the old handling and > then with the new. > >> Or did something change with our implementation of KillSession.kill()? > > KillSession did change though. A change I made to ZK got added in 3.5 and we > now use that. > > -Jordan > >> On Feb 7, 2016, at 1:55 PM, Scott Blum <[email protected] >> <mailto:[email protected]>> wrote: >> >> Are we using a new zookeeper? Or did something change with our >> implementation of KillSession.kill()? >> >> Or maybe there's a timing issue with Curator's ConnectionStateManager State >> change: LOST? I don't understand how we could get a LOST event without the >> ephemeral node attached to that session having disappeared? >> >> On Sun, Feb 7, 2016 at 10:59 AM, Jordan Zimmerman >> <[email protected] <mailto:[email protected]>> wrote: >> I don’t know if anything changed in ZooKeeper itself. I know that the >> connection states changed in Curator, but Curator now tests both the old >> mode and the new mode and they both fail here. >> >> -JZ >> >>> On Feb 7, 2016, at 1:06 AM, Scott Blum <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> I need to analyze this a bit deeper, but what I'm seeing on the 3.0 branch >>> is that the ephemeral node /test/me created in testKilledSession() really >>> isn't disappearing when it should. >>> >>> After the session loss and the reconnect, /test still shows 2 children >>> [foo, me] and /test/me still returns a node. >>> >>> Any idea why the timing here would have changed? >>> >>> On Sat, Feb 6, 2016 at 1:41 PM, Scott Blum <[email protected] >>> <mailto:[email protected]>> wrote: >>> https://issues.apache.org/jira/browse/CURATOR-302 >>> <https://issues.apache.org/jira/browse/CURATOR-302> >>> >>> I need to trace through what's really going on under the hood rather than >>> band-aid the test. Should be able to in next couple of days. >>> >>> On Sat, Feb 6, 2016 at 7:43 AM, Jordan Zimmerman >>> <[email protected] <mailto:[email protected]>> wrote: >>> Any update on this? I think you should create a Jira for it. >>> >>> -Jordan >>> >>> >>>> On Feb 5, 2016, at 12:30 PM, Scott Blum <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>> BTW, this test passes on master... so it's some kind of 3.0 vs. master >>>> issue. I think I'm going to just have to dump in a ton of log messages >>>> and see what differs. >>>> >>>> On Fri, Feb 5, 2016 at 12:25 PM, Jordan Zimmerman >>>> <[email protected] <mailto:[email protected]>> wrote: >>>> OK - please create a new Issue in Jira for this. >>>> >>>> -Jordan >>>> >>>>> On Feb 5, 2016, at 12:24 PM, Scott Blum <[email protected] >>>>> <mailto:[email protected]>> wrote: >>>>> >>>>> BTW: this is broken on CURATOR-3.0 as well, so it appears to have been >>>>> broken for a while. Maybe I'll have to git bisect... >>>>> >>>>> On Fri, Feb 5, 2016 at 12:22 PM, Scott Blum <[email protected] >>>>> <mailto:[email protected]>> wrote: >>>>> Okay, so I looked into this for a bit, and I hit kind of a wall. I think >>>>> there is a legit bug/race in TreeCache, and the following patch *should* >>>>> remedy: >>>>> >>>>> diff --git >>>>> a/curator-recipes/src/main/java/org/apache/curator/framework/recipes/cache/TreeCache.java >>>>> >>>>> b/curator-recipes/src/main/java/org/apache/curator/framework/recipes/cache/TreeCache.java >>>>> index df4403c..a4a022b 100644 >>>>> --- >>>>> a/curator-recipes/src/main/java/org/apache/curator/framework/recipes/cache/TreeCache.java >>>>> +++ >>>>> b/curator-recipes/src/main/java/org/apache/curator/framework/recipes/cache/TreeCache.java >>>>> @@ -303,7 +303,6 @@ public class TreeCache implements Closeable >>>>> void wasDeleted() throws Exception >>>>> { >>>>> ChildData oldChildData = childData.getAndSet(null); >>>>> - >>>>> client.watches().remove(this).ofType(WatcherType.Any).locally().inBackground().forPath(path); >>>>> ConcurrentMap<String, TreeNode> childMap = >>>>> children.getAndSet(null); >>>>> if ( childMap != null ) >>>>> { >>>>> @@ -807,8 +806,16 @@ public class TreeCache implements Closeable >>>>> case RECONNECTED: >>>>> try >>>>> { >>>>> + outstandingOps.incrementAndGet(); >>>>> root.wasReconnected(); >>>>> publishEvent(TreeCacheEvent.Type.CONNECTION_RECONNECTED); >>>>> + if ( outstandingOps.decrementAndGet() == 0 ) >>>>> + { >>>>> + if ( isInitialized.compareAndSet(false, true) ) >>>>> + { >>>>> + publishEvent(TreeCacheEvent.Type.INITIALIZED); >>>>> + } >>>>> + } >>>>> } >>>>> catch ( Exception e ) >>>>> { >>>>> >>>>> That should guarantee that the initialized event gets deferred until all >>>>> outstanding refreshes finish.. but it's not. Something seems to have >>>>> changed under the hood in how background events are getting sent to >>>>> TreeCache, and I don't really understand it yet. And running the >>>>> debugger seems to affect the timing, like something racy is going on. :( >>>>> >>>>> >>>>> On Fri, Feb 5, 2016 at 11:57 AM, Scott Blum <[email protected] >>>>> <mailto:[email protected]>> wrote: >>>>> Ok, that is kind of weird. I'll take a look. >>>>> >>>>> On Fri, Feb 5, 2016 at 4:58 AM, Jordan Zimmerman >>>>> <[email protected] <mailto:[email protected]>> wrote: >>>>> No, sorry. The last few lines of the test currently are: >>>>> >>>>> assertEvent(TreeCacheEvent.Type.NODE_REMOVED, "/test/me", >>>>> "data".getBytes()); >>>>> assertEvent(TreeCacheEvent.Type.INITIALIZED); >>>>> >>>>> This fails. But, if I switch them it works: >>>>> >>>>> assertEvent(TreeCacheEvent.Type.INITIALIZED); >>>>> assertEvent(TreeCacheEvent.Type.NODE_REMOVED, "/test/me", >>>>> "data".getBytes()); >>>>> >>>>>> On Feb 5, 2016, at 2:57 AM, Scott Blum <[email protected] >>>>>> <mailto:[email protected]>> wrote: >>>>>> >>>>>> So you end up with 2 initialized events? >>>>>> >>>>>> You mean this? >>>>>> >>>>>> assertEvent(TreeCacheEvent.Type.CONNECTION_RECONNECTED); >>>>>> + assertEvent(TreeCacheEvent.Type.INITIALIZED); >>>>>> assertEvent(TreeCacheEvent.Type.NODE_REMOVED, "/test/me", >>>>>> "data".getBytes()); >>>>>> assertEvent(TreeCacheEvent.Type.INITIALIZED); >>>>>> >>>>>> Seems weird if there are two, but I can help look. >>>>>> >>>>>> On Thu, Feb 4, 2016 at 10:48 PM, Jordan Zimmerman >>>>>> <[email protected] <mailto:[email protected]>> wrote: >>>>>> Hey Scott, >>>>>> >>>>>> In this branch, TestTreeCache.testKilledSession() is failing at: >>>>>> >>>>>> assertEvent(TreeCacheEvent.Type.NODE_REMOVED, "/test/me", >>>>>> "data".getBytes()); >>>>>> >>>>>> However, if I change the two asserts to: >>>>>> >>>>>> assertEvent(TreeCacheEvent.Type.INITIALIZED); >>>>>> assertEvent(TreeCacheEvent.Type.NODE_REMOVED, "/test/me", >>>>>> "data".getBytes()); >>>>>> >>>>>> it works. Does that make any sense? >>>>>> >>>>>> -Jordan >>>>>> >>>>>> > On Feb 4, 2016, at 9:23 PM, Jordan Zimmerman >>>>>> > <[email protected] <mailto:[email protected]>> wrote: >>>>>> > >>>>>> > Devs, >>>>>> > >>>>>> > In trying to fix the bad log message "Failed to find watcher” (which >>>>>> > turns out to be a ZK client issue), I realize that the >>>>>> > NamespaceWatcher and WatcherWrapper stuff could be improved. I’m still >>>>>> > working on getting all tests to pass but I’d appreciate more sets of >>>>>> > eyes on this change. Please review carefully if you can. >>>>>> > >>>>>> > https://github.com/apache/curator/pull/131 >>>>>> > <https://github.com/apache/curator/pull/131> >>>>>> > >>>>>> > -Jordan >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >>> >> >> >
