Any update on this? I think you should create a Jira for it.

-Jordan

> On Feb 5, 2016, at 12:30 PM, Scott Blum <[email protected]> wrote:
> 
> BTW, this test passes on master... so it's some kind of 3.0 vs. master issue. 
>  I think I'm going to just have to dump in a ton of log messages and see what 
> differs.
> 
> On Fri, Feb 5, 2016 at 12:25 PM, Jordan Zimmerman <[email protected] 
> <mailto:[email protected]>> wrote:
> OK - please create a new Issue in Jira for this.
> 
> -Jordan
> 
>> On Feb 5, 2016, at 12:24 PM, Scott Blum <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> BTW: this is broken on CURATOR-3.0 as well, so it appears to have been 
>> broken for a while.  Maybe I'll have to git bisect...
>> 
>> On Fri, Feb 5, 2016 at 12:22 PM, Scott Blum <[email protected] 
>> <mailto:[email protected]>> wrote:
>> Okay, so I looked into this for a bit, and I hit kind of a wall.  I think 
>> there is a legit bug/race in TreeCache, and the following patch *should* 
>> remedy:
>> 
>> diff --git 
>> a/curator-recipes/src/main/java/org/apache/curator/framework/recipes/cache/TreeCache.java
>>  
>> b/curator-recipes/src/main/java/org/apache/curator/framework/recipes/cache/TreeCache.java
>> index df4403c..a4a022b 100644
>> --- 
>> a/curator-recipes/src/main/java/org/apache/curator/framework/recipes/cache/TreeCache.java
>> +++ 
>> b/curator-recipes/src/main/java/org/apache/curator/framework/recipes/cache/TreeCache.java
>> @@ -303,7 +303,6 @@ public class TreeCache implements Closeable
>>          void wasDeleted() throws Exception
>>          {
>>              ChildData oldChildData = childData.getAndSet(null);
>> -            
>> client.watches().remove(this).ofType(WatcherType.Any).locally().inBackground().forPath(path);
>>              ConcurrentMap<String, TreeNode> childMap = 
>> children.getAndSet(null);
>>              if ( childMap != null )
>>              {
>> @@ -807,8 +806,16 @@ public class TreeCache implements Closeable
>>          case RECONNECTED:
>>              try
>>              {
>> +                outstandingOps.incrementAndGet();
>>                  root.wasReconnected();
>>                  publishEvent(TreeCacheEvent.Type.CONNECTION_RECONNECTED);
>> +                if ( outstandingOps.decrementAndGet() == 0 )
>> +                {
>> +                    if ( isInitialized.compareAndSet(false, true) )
>> +                    {
>> +                        publishEvent(TreeCacheEvent.Type.INITIALIZED);
>> +                    }
>> +                }
>>              }
>>              catch ( Exception e )
>>              {
>> 
>> That should guarantee that the initialized event gets deferred until all 
>> outstanding refreshes finish.. but it's not.  Something seems to have 
>> changed under the hood in how background events are getting sent to 
>> TreeCache, and I don't really understand it yet.  And running the debugger 
>> seems to affect the timing, like something racy is going on. :(
>> 
>> 
>> On Fri, Feb 5, 2016 at 11:57 AM, Scott Blum <[email protected] 
>> <mailto:[email protected]>> wrote:
>> Ok, that is kind of weird.  I'll take a look.
>> 
>> On Fri, Feb 5, 2016 at 4:58 AM, Jordan Zimmerman <[email protected] 
>> <mailto:[email protected]>> wrote:
>> No, sorry. The last few lines of the test currently are:
>> 
>> assertEvent(TreeCacheEvent.Type.NODE_REMOVED, "/test/me", "data".getBytes());
>> assertEvent(TreeCacheEvent.Type.INITIALIZED);
>> 
>> This fails. But, if I switch them it works:
>> 
>> assertEvent(TreeCacheEvent.Type.INITIALIZED);
>> assertEvent(TreeCacheEvent.Type.NODE_REMOVED, "/test/me", "data".getBytes());
>> 
>>> On Feb 5, 2016, at 2:57 AM, Scott Blum <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> So you end up with 2 initialized events?
>>> 
>>> You mean this?
>>> 
>>>          assertEvent(TreeCacheEvent.Type.CONNECTION_RECONNECTED);
>>> +        assertEvent(TreeCacheEvent.Type.INITIALIZED);
>>>          assertEvent(TreeCacheEvent.Type.NODE_REMOVED, "/test/me", 
>>> "data".getBytes());
>>>          assertEvent(TreeCacheEvent.Type.INITIALIZED);
>>> 
>>> Seems weird if there are two, but I can help look.
>>> 
>>> On Thu, Feb 4, 2016 at 10:48 PM, Jordan Zimmerman 
>>> <[email protected] <mailto:[email protected]>> wrote:
>>> Hey Scott,
>>> 
>>> In this branch, TestTreeCache.testKilledSession() is failing at:
>>> 
>>>         assertEvent(TreeCacheEvent.Type.NODE_REMOVED, "/test/me", 
>>> "data".getBytes());
>>> 
>>> However, if I change the two asserts to:
>>> 
>>>         assertEvent(TreeCacheEvent.Type.INITIALIZED);
>>>         assertEvent(TreeCacheEvent.Type.NODE_REMOVED, "/test/me", 
>>> "data".getBytes());
>>> 
>>> it works. Does that make any sense?
>>> 
>>> -Jordan
>>> 
>>> > On Feb 4, 2016, at 9:23 PM, Jordan Zimmerman <[email protected] 
>>> > <mailto:[email protected]>> wrote:
>>> >
>>> > Devs,
>>> >
>>> > In trying to fix the bad log message "Failed to find watcher” (which 
>>> > turns out to be a ZK client issue), I realize that the NamespaceWatcher 
>>> > and WatcherWrapper stuff could be improved. I’m still working on getting 
>>> > all tests to pass but I’d appreciate more sets of eyes on this change. 
>>> > Please review carefully if you can.
>>> >
>>> > https://github.com/apache/curator/pull/131 
>>> > <https://github.com/apache/curator/pull/131>
>>> >
>>> > -Jordan
>>> 
>>> 
>> 
>> 
>> 
>> 
> 
> 

Reply via email to