Well, you could add back an alternate version KillSession that works as it used it. I’m OK with that.
-JZ > On Feb 8, 2016, at 12:06 PM, Scott Blum <[email protected]> wrote: > > Okay, I took a look, and I couldn't see how to use timing.forSessionSleep() > in this test correctly. It's not the test harness I'm trying to block. What > I'm trying to test involves seeing how TreeCache reacts to a real session > loss. The fact that KillSession isn't a real session loss anymore is > therefore impeding my ability to test what I was trying to test before. > > Let me explain. The following manual test does what I want: > > 1) Connect a TreeCache to a remote ZK server through an ssh tunnel. > 2) Follow the node creation steps in testKilledSession() > 3) Kill the tunnel, wait for session loss, restore the tunnel. > > So I guess my question is, is there any way to "really" kill the session, > server side? Or should I formulate this test completely differently? > > Another way to look at this is that I can't synchronize my test steps to > TreeCache's internal operations effectively. > > > On Sun, Feb 7, 2016 at 10:36 PM, Scott Blum <[email protected] > <mailto:[email protected]>> wrote: > Oooh... let me try that. > > On Sun, Feb 7, 2016 at 8:29 PM, Jordan Zimmerman <[email protected] > <mailto:[email protected]>> wrote: > FYI - I added a new method to Timing to help with this: > forSessionSleep() > The new version of KillSession inserts an eventOfDeath directly into the > client. So, no, it’s not a real session kill like it used to be. But, now > it’s more reliable. Use timing.forSessionSleep() to wait for session timeout. > > -JZ > > >> On Feb 7, 2016, at 8:21 PM, Scott Blum <[email protected] >> <mailto:[email protected]>> wrote: >> >> Can you describe the change then? Because kill session doesn't seem to now >> ensure that ephemeral nodes bound the the killed session disappear in a >> timely manner >> >> On Feb 7, 2016 8:03 PM, "Jordan Zimmerman" <[email protected] >> <mailto:[email protected]>> wrote: >>> Are we using a new zookeeper? >> >> >> In Curator 3.0 with “new” connection state handling which simulates a >> Session timeout. However, all tests are run twice. First with the old >> handling and then with the new. >> >>> Or did something change with our implementation of KillSession.kill()? >> >> KillSession did change though. A change I made to ZK got added in 3.5 and we >> now use that. >> >> -Jordan >> >>> On Feb 7, 2016, at 1:55 PM, Scott Blum <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> Are we using a new zookeeper? Or did something change with our >>> implementation of KillSession.kill()? >>> >>> Or maybe there's a timing issue with Curator's ConnectionStateManager State >>> change: LOST? I don't understand how we could get a LOST event without the >>> ephemeral node attached to that session having disappeared? >>> >>> On Sun, Feb 7, 2016 at 10:59 AM, Jordan Zimmerman >>> <[email protected] <mailto:[email protected]>> wrote: >>> I don’t know if anything changed in ZooKeeper itself. I know that the >>> connection states changed in Curator, but Curator now tests both the old >>> mode and the new mode and they both fail here. >>> >>> -JZ >>> >>>> On Feb 7, 2016, at 1:06 AM, Scott Blum <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>> I need to analyze this a bit deeper, but what I'm seeing on the 3.0 branch >>>> is that the ephemeral node /test/me created in testKilledSession() really >>>> isn't disappearing when it should. >>>> >>>> After the session loss and the reconnect, /test still shows 2 children >>>> [foo, me] and /test/me still returns a node. >>>> >>>> Any idea why the timing here would have changed? >>>> >>>> On Sat, Feb 6, 2016 at 1:41 PM, Scott Blum <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> https://issues.apache.org/jira/browse/CURATOR-302 >>>> <https://issues.apache.org/jira/browse/CURATOR-302> >>>> >>>> I need to trace through what's really going on under the hood rather than >>>> band-aid the test. Should be able to in next couple of days. >>>> >>>> On Sat, Feb 6, 2016 at 7:43 AM, Jordan Zimmerman >>>> <[email protected] <mailto:[email protected]>> wrote: >>>> Any update on this? I think you should create a Jira for it. >>>> >>>> -Jordan >>>> >>>> >>>>> On Feb 5, 2016, at 12:30 PM, Scott Blum <[email protected] >>>>> <mailto:[email protected]>> wrote: >>>>> >>>>> BTW, this test passes on master... so it's some kind of 3.0 vs. master >>>>> issue. I think I'm going to just have to dump in a ton of log messages >>>>> and see what differs. >>>>> >>>>> On Fri, Feb 5, 2016 at 12:25 PM, Jordan Zimmerman >>>>> <[email protected] <mailto:[email protected]>> wrote: >>>>> OK - please create a new Issue in Jira for this. >>>>> >>>>> -Jordan >>>>> >>>>>> On Feb 5, 2016, at 12:24 PM, Scott Blum <[email protected] >>>>>> <mailto:[email protected]>> wrote: >>>>>> >>>>>> BTW: this is broken on CURATOR-3.0 as well, so it appears to have been >>>>>> broken for a while. Maybe I'll have to git bisect... >>>>>> >>>>>> On Fri, Feb 5, 2016 at 12:22 PM, Scott Blum <[email protected] >>>>>> <mailto:[email protected]>> wrote: >>>>>> Okay, so I looked into this for a bit, and I hit kind of a wall. I >>>>>> think there is a legit bug/race in TreeCache, and the following patch >>>>>> *should* remedy: >>>>>> >>>>>> diff --git >>>>>> a/curator-recipes/src/main/java/org/apache/curator/framework/recipes/cache/TreeCache.java >>>>>> >>>>>> b/curator-recipes/src/main/java/org/apache/curator/framework/recipes/cache/TreeCache.java >>>>>> index df4403c..a4a022b 100644 >>>>>> --- >>>>>> a/curator-recipes/src/main/java/org/apache/curator/framework/recipes/cache/TreeCache.java >>>>>> +++ >>>>>> b/curator-recipes/src/main/java/org/apache/curator/framework/recipes/cache/TreeCache.java >>>>>> @@ -303,7 +303,6 @@ public class TreeCache implements Closeable >>>>>> void wasDeleted() throws Exception >>>>>> { >>>>>> ChildData oldChildData = childData.getAndSet(null); >>>>>> - >>>>>> client.watches().remove(this).ofType(WatcherType.Any).locally().inBackground().forPath(path); >>>>>> ConcurrentMap<String, TreeNode> childMap = >>>>>> children.getAndSet(null); >>>>>> if ( childMap != null ) >>>>>> { >>>>>> @@ -807,8 +806,16 @@ public class TreeCache implements Closeable >>>>>> case RECONNECTED: >>>>>> try >>>>>> { >>>>>> + outstandingOps.incrementAndGet(); >>>>>> root.wasReconnected(); >>>>>> >>>>>> publishEvent(TreeCacheEvent.Type.CONNECTION_RECONNECTED); >>>>>> + if ( outstandingOps.decrementAndGet() == 0 ) >>>>>> + { >>>>>> + if ( isInitialized.compareAndSet(false, true) ) >>>>>> + { >>>>>> + publishEvent(TreeCacheEvent.Type.INITIALIZED); >>>>>> + } >>>>>> + } >>>>>> } >>>>>> catch ( Exception e ) >>>>>> { >>>>>> >>>>>> That should guarantee that the initialized event gets deferred until all >>>>>> outstanding refreshes finish.. but it's not. Something seems to have >>>>>> changed under the hood in how background events are getting sent to >>>>>> TreeCache, and I don't really understand it yet. And running the >>>>>> debugger seems to affect the timing, like something racy is going on. :( >>>>>> >>>>>> >>>>>> On Fri, Feb 5, 2016 at 11:57 AM, Scott Blum <[email protected] >>>>>> <mailto:[email protected]>> wrote: >>>>>> Ok, that is kind of weird. I'll take a look. >>>>>> >>>>>> On Fri, Feb 5, 2016 at 4:58 AM, Jordan Zimmerman >>>>>> <[email protected] <mailto:[email protected]>> wrote: >>>>>> No, sorry. The last few lines of the test currently are: >>>>>> >>>>>> assertEvent(TreeCacheEvent.Type.NODE_REMOVED, "/test/me", >>>>>> "data".getBytes()); >>>>>> assertEvent(TreeCacheEvent.Type.INITIALIZED); >>>>>> >>>>>> This fails. But, if I switch them it works: >>>>>> >>>>>> assertEvent(TreeCacheEvent.Type.INITIALIZED); >>>>>> assertEvent(TreeCacheEvent.Type.NODE_REMOVED, "/test/me", >>>>>> "data".getBytes()); >>>>>> >>>>>>> On Feb 5, 2016, at 2:57 AM, Scott Blum <[email protected] >>>>>>> <mailto:[email protected]>> wrote: >>>>>>> >>>>>>> So you end up with 2 initialized events? >>>>>>> >>>>>>> You mean this? >>>>>>> >>>>>>> assertEvent(TreeCacheEvent.Type.CONNECTION_RECONNECTED); >>>>>>> + assertEvent(TreeCacheEvent.Type.INITIALIZED); >>>>>>> assertEvent(TreeCacheEvent.Type.NODE_REMOVED, "/test/me", >>>>>>> "data".getBytes()); >>>>>>> assertEvent(TreeCacheEvent.Type.INITIALIZED); >>>>>>> >>>>>>> Seems weird if there are two, but I can help look. >>>>>>> >>>>>>> On Thu, Feb 4, 2016 at 10:48 PM, Jordan Zimmerman >>>>>>> <[email protected] <mailto:[email protected]>> wrote: >>>>>>> Hey Scott, >>>>>>> >>>>>>> In this branch, TestTreeCache.testKilledSession() is failing at: >>>>>>> >>>>>>> assertEvent(TreeCacheEvent.Type.NODE_REMOVED, "/test/me", >>>>>>> "data".getBytes()); >>>>>>> >>>>>>> However, if I change the two asserts to: >>>>>>> >>>>>>> assertEvent(TreeCacheEvent.Type.INITIALIZED); >>>>>>> assertEvent(TreeCacheEvent.Type.NODE_REMOVED, "/test/me", >>>>>>> "data".getBytes()); >>>>>>> >>>>>>> it works. Does that make any sense? >>>>>>> >>>>>>> -Jordan >>>>>>> >>>>>>> > On Feb 4, 2016, at 9:23 PM, Jordan Zimmerman >>>>>>> > <[email protected] <mailto:[email protected]>> >>>>>>> > wrote: >>>>>>> > >>>>>>> > Devs, >>>>>>> > >>>>>>> > In trying to fix the bad log message "Failed to find watcher” (which >>>>>>> > turns out to be a ZK client issue), I realize that the >>>>>>> > NamespaceWatcher and WatcherWrapper stuff could be improved. I’m >>>>>>> > still working on getting all tests to pass but I’d appreciate more >>>>>>> > sets of eyes on this change. Please review carefully if you can. >>>>>>> > >>>>>>> > https://github.com/apache/curator/pull/131 >>>>>>> > <https://github.com/apache/curator/pull/131> >>>>>>> > >>>>>>> > -Jordan >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>> >>> >> > > >
