> Are we using a new zookeeper?
In Curator 3.0 with “new” connection state handling which simulates a Session timeout. However, all tests are run twice. First with the old handling and then with the new. > Or did something change with our implementation of KillSession.kill()? KillSession did change though. A change I made to ZK got added in 3.5 and we now use that. -Jordan > On Feb 7, 2016, at 1:55 PM, Scott Blum <[email protected]> wrote: > > Are we using a new zookeeper? Or did something change with our > implementation of KillSession.kill()? > > Or maybe there's a timing issue with Curator's ConnectionStateManager State > change: LOST? I don't understand how we could get a LOST event without the > ephemeral node attached to that session having disappeared? > > On Sun, Feb 7, 2016 at 10:59 AM, Jordan Zimmerman <[email protected] > <mailto:[email protected]>> wrote: > I don’t know if anything changed in ZooKeeper itself. I know that the > connection states changed in Curator, but Curator now tests both the old mode > and the new mode and they both fail here. > > -JZ > >> On Feb 7, 2016, at 1:06 AM, Scott Blum <[email protected] >> <mailto:[email protected]>> wrote: >> >> I need to analyze this a bit deeper, but what I'm seeing on the 3.0 branch >> is that the ephemeral node /test/me created in testKilledSession() really >> isn't disappearing when it should. >> >> After the session loss and the reconnect, /test still shows 2 children [foo, >> me] and /test/me still returns a node. >> >> Any idea why the timing here would have changed? >> >> On Sat, Feb 6, 2016 at 1:41 PM, Scott Blum <[email protected] >> <mailto:[email protected]>> wrote: >> https://issues.apache.org/jira/browse/CURATOR-302 >> <https://issues.apache.org/jira/browse/CURATOR-302> >> >> I need to trace through what's really going on under the hood rather than >> band-aid the test. Should be able to in next couple of days. >> >> On Sat, Feb 6, 2016 at 7:43 AM, Jordan Zimmerman <[email protected] >> <mailto:[email protected]>> wrote: >> Any update on this? I think you should create a Jira for it. >> >> -Jordan >> >> >>> On Feb 5, 2016, at 12:30 PM, Scott Blum <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> BTW, this test passes on master... so it's some kind of 3.0 vs. master >>> issue. I think I'm going to just have to dump in a ton of log messages and >>> see what differs. >>> >>> On Fri, Feb 5, 2016 at 12:25 PM, Jordan Zimmerman >>> <[email protected] <mailto:[email protected]>> wrote: >>> OK - please create a new Issue in Jira for this. >>> >>> -Jordan >>> >>>> On Feb 5, 2016, at 12:24 PM, Scott Blum <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>> BTW: this is broken on CURATOR-3.0 as well, so it appears to have been >>>> broken for a while. Maybe I'll have to git bisect... >>>> >>>> On Fri, Feb 5, 2016 at 12:22 PM, Scott Blum <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> Okay, so I looked into this for a bit, and I hit kind of a wall. I think >>>> there is a legit bug/race in TreeCache, and the following patch *should* >>>> remedy: >>>> >>>> diff --git >>>> a/curator-recipes/src/main/java/org/apache/curator/framework/recipes/cache/TreeCache.java >>>> >>>> b/curator-recipes/src/main/java/org/apache/curator/framework/recipes/cache/TreeCache.java >>>> index df4403c..a4a022b 100644 >>>> --- >>>> a/curator-recipes/src/main/java/org/apache/curator/framework/recipes/cache/TreeCache.java >>>> +++ >>>> b/curator-recipes/src/main/java/org/apache/curator/framework/recipes/cache/TreeCache.java >>>> @@ -303,7 +303,6 @@ public class TreeCache implements Closeable >>>> void wasDeleted() throws Exception >>>> { >>>> ChildData oldChildData = childData.getAndSet(null); >>>> - >>>> client.watches().remove(this).ofType(WatcherType.Any).locally().inBackground().forPath(path); >>>> ConcurrentMap<String, TreeNode> childMap = >>>> children.getAndSet(null); >>>> if ( childMap != null ) >>>> { >>>> @@ -807,8 +806,16 @@ public class TreeCache implements Closeable >>>> case RECONNECTED: >>>> try >>>> { >>>> + outstandingOps.incrementAndGet(); >>>> root.wasReconnected(); >>>> publishEvent(TreeCacheEvent.Type.CONNECTION_RECONNECTED); >>>> + if ( outstandingOps.decrementAndGet() == 0 ) >>>> + { >>>> + if ( isInitialized.compareAndSet(false, true) ) >>>> + { >>>> + publishEvent(TreeCacheEvent.Type.INITIALIZED); >>>> + } >>>> + } >>>> } >>>> catch ( Exception e ) >>>> { >>>> >>>> That should guarantee that the initialized event gets deferred until all >>>> outstanding refreshes finish.. but it's not. Something seems to have >>>> changed under the hood in how background events are getting sent to >>>> TreeCache, and I don't really understand it yet. And running the debugger >>>> seems to affect the timing, like something racy is going on. :( >>>> >>>> >>>> On Fri, Feb 5, 2016 at 11:57 AM, Scott Blum <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> Ok, that is kind of weird. I'll take a look. >>>> >>>> On Fri, Feb 5, 2016 at 4:58 AM, Jordan Zimmerman >>>> <[email protected] <mailto:[email protected]>> wrote: >>>> No, sorry. The last few lines of the test currently are: >>>> >>>> assertEvent(TreeCacheEvent.Type.NODE_REMOVED, "/test/me", >>>> "data".getBytes()); >>>> assertEvent(TreeCacheEvent.Type.INITIALIZED); >>>> >>>> This fails. But, if I switch them it works: >>>> >>>> assertEvent(TreeCacheEvent.Type.INITIALIZED); >>>> assertEvent(TreeCacheEvent.Type.NODE_REMOVED, "/test/me", >>>> "data".getBytes()); >>>> >>>>> On Feb 5, 2016, at 2:57 AM, Scott Blum <[email protected] >>>>> <mailto:[email protected]>> wrote: >>>>> >>>>> So you end up with 2 initialized events? >>>>> >>>>> You mean this? >>>>> >>>>> assertEvent(TreeCacheEvent.Type.CONNECTION_RECONNECTED); >>>>> + assertEvent(TreeCacheEvent.Type.INITIALIZED); >>>>> assertEvent(TreeCacheEvent.Type.NODE_REMOVED, "/test/me", >>>>> "data".getBytes()); >>>>> assertEvent(TreeCacheEvent.Type.INITIALIZED); >>>>> >>>>> Seems weird if there are two, but I can help look. >>>>> >>>>> On Thu, Feb 4, 2016 at 10:48 PM, Jordan Zimmerman >>>>> <[email protected] <mailto:[email protected]>> wrote: >>>>> Hey Scott, >>>>> >>>>> In this branch, TestTreeCache.testKilledSession() is failing at: >>>>> >>>>> assertEvent(TreeCacheEvent.Type.NODE_REMOVED, "/test/me", >>>>> "data".getBytes()); >>>>> >>>>> However, if I change the two asserts to: >>>>> >>>>> assertEvent(TreeCacheEvent.Type.INITIALIZED); >>>>> assertEvent(TreeCacheEvent.Type.NODE_REMOVED, "/test/me", >>>>> "data".getBytes()); >>>>> >>>>> it works. Does that make any sense? >>>>> >>>>> -Jordan >>>>> >>>>> > On Feb 4, 2016, at 9:23 PM, Jordan Zimmerman >>>>> > <[email protected] <mailto:[email protected]>> wrote: >>>>> > >>>>> > Devs, >>>>> > >>>>> > In trying to fix the bad log message "Failed to find watcher” (which >>>>> > turns out to be a ZK client issue), I realize that the NamespaceWatcher >>>>> > and WatcherWrapper stuff could be improved. I’m still working on >>>>> > getting all tests to pass but I’d appreciate more sets of eyes on this >>>>> > change. Please review carefully if you can. >>>>> > >>>>> > https://github.com/apache/curator/pull/131 >>>>> > <https://github.com/apache/curator/pull/131> >>>>> > >>>>> > -Jordan >>>>> >>>>> >>>> >>>> >>>> >>>> >>> >>> >> >> >> > >
