Github user Randgalt commented on the pull request:

    https://github.com/apache/curator/pull/148#issuecomment-220468433
  
    I did some further debugging and the problem has something to do with the 
same node being deleted/updated/recreated in quick succession. I'm concerned 
that the node states are handle by individual TreeNode instances. Look at this 
log (filtered for node "/root/0"):
    
    ```
    2133 processResult: @Node1 CuratorEventImpl{type=GET_DATA, resultCode=0, 
path='/root/0', name='null', children=null, context=null, 
stat=5135,5153,1463694250431,1463694250434,2,0,0,0,3,0,5135, data=[110, 101, 
119], watchedEvent=null, aclList=null} [main-EventThread]
    2133 publishEvent: @Node1 /root/0 NODE_ADDED [main-EventThread]
    2134 processResult: @Node1 CuratorEventImpl{type=CHILDREN, resultCode=0, 
path='/root/0', name='null', children=[], context=null, 
stat=5135,5153,1463694250431,1463694250434,2,0,0,0,3,0,5135
    2135 processResult: @root CuratorEventImpl{type=CHILDREN, resultCode=0, 
path='/root', name='null', children=[3, 0, 7, 9, 8], context=null, 
stat=2,2,1463694248754,1463694248754,0,1729,0,0,9,5,5154
    2136 process: @Node1 WatchedEvent state:SyncConnected type:NodeDataChanged 
path:/root/0 [main-EventThread]
    2136 process: @Node1 WatchedEvent state:SyncConnected type:NodeDeleted 
path:/root/0 [main-EventThread]
    2136 publishEvent: @Node1 /root/0 NODE_REMOVED [main-EventThread]
    2154 processResult: @root CuratorEventImpl{type=CHILDREN, resultCode=0, 
path='/root', name='null', children=[0, 6, 5, 4, 8], context=null, 
stat=2,2,1463694248754,1463694248754,0,1765,0,0,9,5,5270
    2155 processResult: @Node1 CuratorEventImpl{type=GET_DATA, resultCode=0, 
path='/root/0', name='null', children=null, context=null, 
stat=5268,5268,1463694250454,1463694250454,0,0,0,0,9,0,5268
    2155 publishEvent: @Node1 /root/0 NODE_ADDED [main-EventThread]
    2167 processResult: @Node2 CuratorEventImpl{type=GET_DATA, resultCode=-101, 
path='/root/0', name='null', children=null, context=null, stat=null, data=null, 
watchedEvent=null, aclList=null} [main-EventThread]
    2167 wasDeleted @Node2 /root/0 PENDING [main-EventThread]
    2167 processResult: @Node2 CuratorEventImpl{type=CHILDREN, resultCode=-101, 
path='/root/0', name='null', children=[], context=null, stat=null, data=null, 
watchedEvent=null, aclList=null} [main-EventThread]
    2167 wasDeleted @Node2 /root/0 DEAD [main-EventThread]
    2183 processResult: @root CuratorEventImpl{type=CHILDREN, resultCode=0, 
path='/root', name='null', children=[3, 7, 5, 9, 8], context=null, 
stat=2,2,1463694248754,1463694248754,0,1805,0,0,9,5,5380
    2233 processResult: @root CuratorEventImpl{type=CHILDREN, resultCode=0, 
path='/root', name='null', children=[0, 7, 5, 4], context=null, 
stat=2,2,1463694248754,1463694248754,0,1842,0,0,9,4,5484
    2259 processResult: @Node3 CuratorEventImpl{type=GET_DATA, resultCode=-101, 
path='/root/0', name='null', children=null, context=null, stat=null, data=null, 
watchedEvent=null, aclList=null} [main-EventThread]
    2259 wasDeleted @Node3 /root/0 PENDING [main-EventThread]
    2259 processResult: @Node3 CuratorEventImpl{type=CHILDREN, resultCode=-101, 
path='/root/0', name='null', children=[], context=null, stat=null, data=null, 
watchedEvent=null, aclList=null} [main-EventThread]
    2260 wasDeleted @Node3 /root/0 DEAD [main-EventThread]
    2261 processResult: @root CuratorEventImpl{type=CHILDREN, resultCode=0, 
path='/root', name='null', children=[7, 6, 9], context=null, 
stat=2,2,1463694248754,1463694248754,0,1875,0,0,9,3,5604
    2273 processResult: @root CuratorEventImpl{type=CHILDREN, resultCode=0, 
path='/root', name='null', children=[1, 0, 5, 9], context=null, 
stat=2,2,1463694248754,1463694248754,0,1906,0,0,9,4,5706
    2282 processResult: @Node4 CuratorEventImpl{type=GET_DATA, resultCode=0, 
path='/root/0', name='null', children=null, context=null, 
stat=5797,5811,1463694250589,1463694250591,1,0,0,0,3,0,5797
    2282 publishEvent:@Node4 /root/0 NODE_ADDED [main-EventThread]
    ```
    
    * The first column is time
    * The second column is the operation
    * The third is the TreeNode doing the operation (I made the name human 
readable) 
    * The last bit is the event - wasDeleted means that wasDeleted() did not 
publish NODE_REMOVED because the node state was not LIVE
    
    The problem here is the two NODE_ADDED publishes at 2155 and 2282. I 
believe there is a missing NODE_REMOVED event at 2260.
    
    Note, at time 2136 we get watcher process() NodeDataChanged immediately 
followed by a NodeDeleted on @Node1's watcher. 
    @Node1 then publishes NODE_REMOVED (correctly) at 2136.
    
    At 2154, @root gets the result of calling getChildren. I assume that 
"/root/0" was recreated and @root's watcher got notified. Note that
    "0" is in the children list. So, this will cause a new node to get created 
and its wasCreated() called. At 2155, the old @Node1 gets the
    result of GET_DATA. This causes a false NODE_ADDED to get generated. @Node1 
is officially dead and should not be reporting this.
    
    I believe the sequence of events is: Just prior to 2136, 1 test code thread 
changes "/root/0" and another deletes it and another recreates it almost 
immediately.
    @Node1's watcher is triggered for NodeDataChanged and begins an async call 
to getData(), it then receives NodeDeleted BEFORE the async getData() occurs on 
the server.
    The node then gets recreated. The getData() finally executes on the server 
and the result is returned such that @Node1 misses the node deletion.
    
    Anyway, I have a potential fix for now. In processResults() GET_DATA case, 
add:
    
    ```
    if ( nodeState.get() == NodeState.DEAD )
    {
        break;
    }
    ```
    
    This fixes the problem. Longer term, we should consider moving node 
management into the main TreeCache instance and not the individual node objects.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to