Re: PLEASE REVIEW - Major re-work of Watcher wrappers

Jordan Zimmerman Sun, 07 Feb 2016 08:00:07 -0800

I don’t know if anything changed in ZooKeeper itself. I know that the 
connection states changed in Curator, but Curator now tests both the old mode 
and the new mode and they both fail here.


-JZ

> On Feb 7, 2016, at 1:06 AM, Scott Blum <[email protected]> wrote:
> 
> I need to analyze this a bit deeper, but what I'm seeing on the 3.0 branch is 
> that the ephemeral node /test/me created in testKilledSession() really isn't 
> disappearing when it should.
> 
> After the session loss and the reconnect, /test still shows 2 children [foo, 
> me] and /test/me still returns a node.
> 
> Any idea why the timing here would have changed?
> 
> On Sat, Feb 6, 2016 at 1:41 PM, Scott Blum <[email protected] 
> <mailto:[email protected]>> wrote:
> https://issues.apache.org/jira/browse/CURATOR-302 
> <https://issues.apache.org/jira/browse/CURATOR-302>
> 
> I need to trace through what's really going on under the hood rather than 
> band-aid the test.  Should be able to in next couple of days.
> 
> On Sat, Feb 6, 2016 at 7:43 AM, Jordan Zimmerman <[email protected] 
> <mailto:[email protected]>> wrote:
> Any update on this? I think you should create a Jira for it.
> 
> -Jordan
> 
> 
>> On Feb 5, 2016, at 12:30 PM, Scott Blum <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> BTW, this test passes on master... so it's some kind of 3.0 vs. master 
>> issue.  I think I'm going to just have to dump in a ton of log messages and 
>> see what differs.
>> 
>> On Fri, Feb 5, 2016 at 12:25 PM, Jordan Zimmerman 
>> <[email protected] <mailto:[email protected]>> wrote:
>> OK - please create a new Issue in Jira for this.
>> 
>> -Jordan
>> 
>>> On Feb 5, 2016, at 12:24 PM, Scott Blum <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> BTW: this is broken on CURATOR-3.0 as well, so it appears to have been 
>>> broken for a while.  Maybe I'll have to git bisect...
>>> 
>>> On Fri, Feb 5, 2016 at 12:22 PM, Scott Blum <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> Okay, so I looked into this for a bit, and I hit kind of a wall.  I think 
>>> there is a legit bug/race in TreeCache, and the following patch *should* 
>>> remedy:
>>> 
>>> diff --git 
>>> a/curator-recipes/src/main/java/org/apache/curator/framework/recipes/cache/TreeCache.java
>>>  
>>> b/curator-recipes/src/main/java/org/apache/curator/framework/recipes/cache/TreeCache.java
>>> index df4403c..a4a022b 100644
>>> --- 
>>> a/curator-recipes/src/main/java/org/apache/curator/framework/recipes/cache/TreeCache.java
>>> +++ 
>>> b/curator-recipes/src/main/java/org/apache/curator/framework/recipes/cache/TreeCache.java
>>> @@ -303,7 +303,6 @@ public class TreeCache implements Closeable
>>>          void wasDeleted() throws Exception
>>>          {
>>>              ChildData oldChildData = childData.getAndSet(null);
>>> -            
>>> client.watches().remove(this).ofType(WatcherType.Any).locally().inBackground().forPath(path);
>>>              ConcurrentMap<String, TreeNode> childMap = 
>>> children.getAndSet(null);
>>>              if ( childMap != null )
>>>              {
>>> @@ -807,8 +806,16 @@ public class TreeCache implements Closeable
>>>          case RECONNECTED:
>>>              try
>>>              {
>>> +                outstandingOps.incrementAndGet();
>>>                  root.wasReconnected();
>>>                  publishEvent(TreeCacheEvent.Type.CONNECTION_RECONNECTED);
>>> +                if ( outstandingOps.decrementAndGet() == 0 )
>>> +                {
>>> +                    if ( isInitialized.compareAndSet(false, true) )
>>> +                    {
>>> +                        publishEvent(TreeCacheEvent.Type.INITIALIZED);
>>> +                    }
>>> +                }
>>>              }
>>>              catch ( Exception e )
>>>              {
>>> 
>>> That should guarantee that the initialized event gets deferred until all 
>>> outstanding refreshes finish.. but it's not.  Something seems to have 
>>> changed under the hood in how background events are getting sent to 
>>> TreeCache, and I don't really understand it yet.  And running the debugger 
>>> seems to affect the timing, like something racy is going on. :(
>>> 
>>> 
>>> On Fri, Feb 5, 2016 at 11:57 AM, Scott Blum <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> Ok, that is kind of weird.  I'll take a look.
>>> 
>>> On Fri, Feb 5, 2016 at 4:58 AM, Jordan Zimmerman 
>>> <[email protected] <mailto:[email protected]>> wrote:
>>> No, sorry. The last few lines of the test currently are:
>>> 
>>> assertEvent(TreeCacheEvent.Type.NODE_REMOVED, "/test/me", 
>>> "data".getBytes());
>>> assertEvent(TreeCacheEvent.Type.INITIALIZED);
>>> 
>>> This fails. But, if I switch them it works:
>>> 
>>> assertEvent(TreeCacheEvent.Type.INITIALIZED);
>>> assertEvent(TreeCacheEvent.Type.NODE_REMOVED, "/test/me", 
>>> "data".getBytes());
>>> 
>>>> On Feb 5, 2016, at 2:57 AM, Scott Blum <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> So you end up with 2 initialized events?
>>>> 
>>>> You mean this?
>>>> 
>>>>          assertEvent(TreeCacheEvent.Type.CONNECTION_RECONNECTED);
>>>> +        assertEvent(TreeCacheEvent.Type.INITIALIZED);
>>>>          assertEvent(TreeCacheEvent.Type.NODE_REMOVED, "/test/me", 
>>>> "data".getBytes());
>>>>          assertEvent(TreeCacheEvent.Type.INITIALIZED);
>>>> 
>>>> Seems weird if there are two, but I can help look.
>>>> 
>>>> On Thu, Feb 4, 2016 at 10:48 PM, Jordan Zimmerman 
>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>> Hey Scott,
>>>> 
>>>> In this branch, TestTreeCache.testKilledSession() is failing at:
>>>> 
>>>>         assertEvent(TreeCacheEvent.Type.NODE_REMOVED, "/test/me", 
>>>> "data".getBytes());
>>>> 
>>>> However, if I change the two asserts to:
>>>> 
>>>>         assertEvent(TreeCacheEvent.Type.INITIALIZED);
>>>>         assertEvent(TreeCacheEvent.Type.NODE_REMOVED, "/test/me", 
>>>> "data".getBytes());
>>>> 
>>>> it works. Does that make any sense?
>>>> 
>>>> -Jordan
>>>> 
>>>> > On Feb 4, 2016, at 9:23 PM, Jordan Zimmerman <[email protected] 
>>>> > <mailto:[email protected]>> wrote:
>>>> >
>>>> > Devs,
>>>> >
>>>> > In trying to fix the bad log message "Failed to find watcher” (which 
>>>> > turns out to be a ZK client issue), I realize that the NamespaceWatcher 
>>>> > and WatcherWrapper stuff could be improved. I’m still working on getting 
>>>> > all tests to pass but I’d appreciate more sets of eyes on this change. 
>>>> > Please review carefully if you can.
>>>> >
>>>> > https://github.com/apache/curator/pull/131 
>>>> > <https://github.com/apache/curator/pull/131>
>>>> >
>>>> > -Jordan
>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
> 
> 
>

Re: PLEASE REVIEW - Major re-work of Watcher wrappers

Reply via email to