Oooh... let me try that. On Sun, Feb 7, 2016 at 8:29 PM, Jordan Zimmerman <[email protected] > wrote:
> FYI - I added a new method to Timing to help with this: > > forSessionSleep() > > The new version of KillSession inserts an eventOfDeath directly into the > client. So, no, it’s not a real session kill like it used to be. But, now > it’s more reliable. Use timing.forSessionSleep() to wait for session > timeout. > > -JZ > > > On Feb 7, 2016, at 8:21 PM, Scott Blum <[email protected]> wrote: > > Can you describe the change then? Because kill session doesn't seem to now > ensure that ephemeral nodes bound the the killed session disappear in a > timely manner > On Feb 7, 2016 8:03 PM, "Jordan Zimmerman" <[email protected]> > wrote: > >> Are we using a new zookeeper? >> >> >> In Curator 3.0 with “new” connection state handling which simulates a >> Session timeout. However, all tests are run twice. First with the old >> handling and then with the new. >> >> Or did something change with our implementation of KillSession.kill()? >> >> >> KillSession did change though. A change I made to ZK got added in 3.5 and >> we now use that. >> >> -Jordan >> >> On Feb 7, 2016, at 1:55 PM, Scott Blum <[email protected]> wrote: >> >> Are we using a new zookeeper? Or did something change with our >> implementation of KillSession.kill()? >> >> Or maybe there's a timing issue with Curator's ConnectionStateManager >> State change: LOST? I don't understand how we could get a LOST event >> without the ephemeral node attached to that session having disappeared? >> >> On Sun, Feb 7, 2016 at 10:59 AM, Jordan Zimmerman < >> [email protected]> wrote: >> >>> I don’t know if anything changed in ZooKeeper itself. I know that the >>> connection states changed in Curator, but Curator now tests both the old >>> mode and the new mode and they both fail here. >>> >>> -JZ >>> >>> On Feb 7, 2016, at 1:06 AM, Scott Blum <[email protected]> wrote: >>> >>> I need to analyze this a bit deeper, but what I'm seeing on the 3.0 >>> branch is that the ephemeral node /test/me created in >>> testKilledSession() really isn't disappearing when it should. >>> >>> After the session loss and the reconnect, /test still shows 2 children >>> [foo, me] and /test/me still returns a node. >>> >>> Any idea why the timing here would have changed? >>> >>> On Sat, Feb 6, 2016 at 1:41 PM, Scott Blum <[email protected]> >>> wrote: >>> >>>> https://issues.apache.org/jira/browse/CURATOR-302 >>>> >>>> I need to trace through what's really going on under the hood rather >>>> than band-aid the test. Should be able to in next couple of days. >>>> >>>> On Sat, Feb 6, 2016 at 7:43 AM, Jordan Zimmerman < >>>> [email protected]> wrote: >>>> >>>>> Any update on this? I think you should create a Jira for it. >>>>> >>>>> -Jordan >>>>> >>>>> >>>>> On Feb 5, 2016, at 12:30 PM, Scott Blum <[email protected]> wrote: >>>>> >>>>> BTW, this test passes on master... so it's some kind of 3.0 vs. master >>>>> issue. I think I'm going to just have to dump in a ton of log messages >>>>> and >>>>> see what differs. >>>>> >>>>> On Fri, Feb 5, 2016 at 12:25 PM, Jordan Zimmerman < >>>>> [email protected]> wrote: >>>>> >>>>>> OK - please create a new Issue in Jira for this. >>>>>> >>>>>> -Jordan >>>>>> >>>>>> On Feb 5, 2016, at 12:24 PM, Scott Blum <[email protected]> >>>>>> wrote: >>>>>> >>>>>> BTW: this is broken on CURATOR-3.0 as well, so it appears to have >>>>>> been broken for a while. Maybe I'll have to git bisect... >>>>>> >>>>>> On Fri, Feb 5, 2016 at 12:22 PM, Scott Blum <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Okay, so I looked into this for a bit, and I hit kind of a wall. I >>>>>>> think there is a legit bug/race in TreeCache, and the following patch >>>>>>> *should* remedy: >>>>>>> >>>>>>> diff --git >>>>>>> a/curator-recipes/src/main/java/org/apache/curator/framework/recipes/cache/TreeCache.java >>>>>>> b/curator-recipes/src/main/java/org/apache/curator/framework/recipes/cache/TreeCache.java >>>>>>> index df4403c..a4a022b 100644 >>>>>>> --- >>>>>>> a/curator-recipes/src/main/java/org/apache/curator/framework/recipes/cache/TreeCache.java >>>>>>> +++ >>>>>>> b/curator-recipes/src/main/java/org/apache/curator/framework/recipes/cache/TreeCache.java >>>>>>> @@ -303,7 +303,6 @@ public class TreeCache implements Closeable >>>>>>> void wasDeleted() throws Exception >>>>>>> { >>>>>>> ChildData oldChildData = childData.getAndSet(null); >>>>>>> - >>>>>>> >>>>>>> client.watches().remove(this).ofType(WatcherType.Any).locally().inBackground().forPath(path); >>>>>>> ConcurrentMap<String, TreeNode> childMap = >>>>>>> children.getAndSet(null); >>>>>>> if ( childMap != null ) >>>>>>> { >>>>>>> @@ -807,8 +806,16 @@ public class TreeCache implements Closeable >>>>>>> case RECONNECTED: >>>>>>> try >>>>>>> { >>>>>>> + outstandingOps.incrementAndGet(); >>>>>>> root.wasReconnected(); >>>>>>> >>>>>>> publishEvent(TreeCacheEvent.Type.CONNECTION_RECONNECTED); >>>>>>> + if ( outstandingOps.decrementAndGet() == 0 ) >>>>>>> + { >>>>>>> + if ( isInitialized.compareAndSet(false, true) ) >>>>>>> + { >>>>>>> + >>>>>>> publishEvent(TreeCacheEvent.Type.INITIALIZED); >>>>>>> + } >>>>>>> + } >>>>>>> } >>>>>>> catch ( Exception e ) >>>>>>> { >>>>>>> >>>>>>> That should guarantee that the initialized event gets deferred until >>>>>>> all outstanding refreshes finish.. but it's not. Something seems to >>>>>>> have >>>>>>> changed under the hood in how background events are getting sent to >>>>>>> TreeCache, and I don't really understand it yet. And running the >>>>>>> debugger >>>>>>> seems to affect the timing, like something racy is going on. :( >>>>>>> >>>>>>> >>>>>>> On Fri, Feb 5, 2016 at 11:57 AM, Scott Blum <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Ok, that is kind of weird. I'll take a look. >>>>>>>> >>>>>>>> On Fri, Feb 5, 2016 at 4:58 AM, Jordan Zimmerman < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> No, sorry. The last few lines of the test currently are: >>>>>>>>> >>>>>>>>> >>>>>>>>> assertEvent(TreeCacheEvent.Type.NODE_REMOVED, "/test/me", >>>>>>>>> "data".getBytes()); >>>>>>>>> assertEvent(TreeCacheEvent.Type.INITIALIZED); >>>>>>>>> >>>>>>>>> This fails. But, if I switch them it works: >>>>>>>>> >>>>>>>>> assertEvent(TreeCacheEvent.Type.INITIALIZED); >>>>>>>>> >>>>>>>>> assertEvent(TreeCacheEvent.Type.NODE_REMOVED, "/test/me", >>>>>>>>> "data".getBytes()); >>>>>>>>> >>>>>>>>> On Feb 5, 2016, at 2:57 AM, Scott Blum <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> So you end up with 2 initialized events? >>>>>>>>> >>>>>>>>> You mean this? >>>>>>>>> >>>>>>>>> assertEvent(TreeCacheEvent.Type.CONNECTION_RECONNECTED); >>>>>>>>> + assertEvent(TreeCacheEvent.Type.INITIALIZED); >>>>>>>>> assertEvent(TreeCacheEvent.Type.NODE_REMOVED, "/test/me", >>>>>>>>> "data".getBytes()); >>>>>>>>> assertEvent(TreeCacheEvent.Type.INITIALIZED); >>>>>>>>> >>>>>>>>> Seems weird if there are two, but I can help look. >>>>>>>>> >>>>>>>>> On Thu, Feb 4, 2016 at 10:48 PM, Jordan Zimmerman < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hey Scott, >>>>>>>>>> >>>>>>>>>> In this branch, TestTreeCache.testKilledSession() is failing at: >>>>>>>>>> >>>>>>>>>> assertEvent(TreeCacheEvent.Type.NODE_REMOVED, "/test/me", >>>>>>>>>> "data".getBytes()); >>>>>>>>>> >>>>>>>>>> However, if I change the two asserts to: >>>>>>>>>> >>>>>>>>>> assertEvent(TreeCacheEvent.Type.INITIALIZED); >>>>>>>>>> assertEvent(TreeCacheEvent.Type.NODE_REMOVED, "/test/me", >>>>>>>>>> "data".getBytes()); >>>>>>>>>> >>>>>>>>>> it works. Does that make any sense? >>>>>>>>>> >>>>>>>>>> -Jordan >>>>>>>>>> >>>>>>>>>> > On Feb 4, 2016, at 9:23 PM, Jordan Zimmerman < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> > >>>>>>>>>> > Devs, >>>>>>>>>> > >>>>>>>>>> > In trying to fix the bad log message "Failed to find watcher” >>>>>>>>>> (which turns out to be a ZK client issue), I realize that the >>>>>>>>>> NamespaceWatcher and WatcherWrapper stuff could be improved. I’m >>>>>>>>>> still >>>>>>>>>> working on getting all tests to pass but I’d appreciate more sets of >>>>>>>>>> eyes >>>>>>>>>> on this change. Please review carefully if you can. >>>>>>>>>> > >>>>>>>>>> > https://github.com/apache/curator/pull/131 >>>>>>>>>> > >>>>>>>>>> > -Jordan >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>> >>> >> >> >
