Github user Randgalt commented on the pull request:
https://github.com/apache/curator/pull/148#issuecomment-220468433
I did some further debugging and the problem has something to do with the
same node being deleted/updated/recreated in quick succession. I'm concerned
that the node states are handle by individual TreeNode instances. Look at this
log (filtered for node "/root/0"):
```
2133 processResult: @Node1 CuratorEventImpl{type=GET_DATA, resultCode=0,
path='/root/0', name='null', children=null, context=null,
stat=5135,5153,1463694250431,1463694250434,2,0,0,0,3,0,5135, data=[110, 101,
119], watchedEvent=null, aclList=null} [main-EventThread]
2133 publishEvent: @Node1 /root/0 NODE_ADDED [main-EventThread]
2134 processResult: @Node1 CuratorEventImpl{type=CHILDREN, resultCode=0,
path='/root/0', name='null', children=[], context=null,
stat=5135,5153,1463694250431,1463694250434,2,0,0,0,3,0,5135
2135 processResult: @root CuratorEventImpl{type=CHILDREN, resultCode=0,
path='/root', name='null', children=[3, 0, 7, 9, 8], context=null,
stat=2,2,1463694248754,1463694248754,0,1729,0,0,9,5,5154
2136 process: @Node1 WatchedEvent state:SyncConnected type:NodeDataChanged
path:/root/0 [main-EventThread]
2136 process: @Node1 WatchedEvent state:SyncConnected type:NodeDeleted
path:/root/0 [main-EventThread]
2136 publishEvent: @Node1 /root/0 NODE_REMOVED [main-EventThread]
2154 processResult: @root CuratorEventImpl{type=CHILDREN, resultCode=0,
path='/root', name='null', children=[0, 6, 5, 4, 8], context=null,
stat=2,2,1463694248754,1463694248754,0,1765,0,0,9,5,5270
2155 processResult: @Node1 CuratorEventImpl{type=GET_DATA, resultCode=0,
path='/root/0', name='null', children=null, context=null,
stat=5268,5268,1463694250454,1463694250454,0,0,0,0,9,0,5268
2155 publishEvent: @Node1 /root/0 NODE_ADDED [main-EventThread]
2167 processResult: @Node2 CuratorEventImpl{type=GET_DATA, resultCode=-101,
path='/root/0', name='null', children=null, context=null, stat=null, data=null,
watchedEvent=null, aclList=null} [main-EventThread]
2167 wasDeleted @Node2 /root/0 PENDING [main-EventThread]
2167 processResult: @Node2 CuratorEventImpl{type=CHILDREN, resultCode=-101,
path='/root/0', name='null', children=[], context=null, stat=null, data=null,
watchedEvent=null, aclList=null} [main-EventThread]
2167 wasDeleted @Node2 /root/0 DEAD [main-EventThread]
2183 processResult: @root CuratorEventImpl{type=CHILDREN, resultCode=0,
path='/root', name='null', children=[3, 7, 5, 9, 8], context=null,
stat=2,2,1463694248754,1463694248754,0,1805,0,0,9,5,5380
2233 processResult: @root CuratorEventImpl{type=CHILDREN, resultCode=0,
path='/root', name='null', children=[0, 7, 5, 4], context=null,
stat=2,2,1463694248754,1463694248754,0,1842,0,0,9,4,5484
2259 processResult: @Node3 CuratorEventImpl{type=GET_DATA, resultCode=-101,
path='/root/0', name='null', children=null, context=null, stat=null, data=null,
watchedEvent=null, aclList=null} [main-EventThread]
2259 wasDeleted @Node3 /root/0 PENDING [main-EventThread]
2259 processResult: @Node3 CuratorEventImpl{type=CHILDREN, resultCode=-101,
path='/root/0', name='null', children=[], context=null, stat=null, data=null,
watchedEvent=null, aclList=null} [main-EventThread]
2260 wasDeleted @Node3 /root/0 DEAD [main-EventThread]
2261 processResult: @root CuratorEventImpl{type=CHILDREN, resultCode=0,
path='/root', name='null', children=[7, 6, 9], context=null,
stat=2,2,1463694248754,1463694248754,0,1875,0,0,9,3,5604
2273 processResult: @root CuratorEventImpl{type=CHILDREN, resultCode=0,
path='/root', name='null', children=[1, 0, 5, 9], context=null,
stat=2,2,1463694248754,1463694248754,0,1906,0,0,9,4,5706
2282 processResult: @Node4 CuratorEventImpl{type=GET_DATA, resultCode=0,
path='/root/0', name='null', children=null, context=null,
stat=5797,5811,1463694250589,1463694250591,1,0,0,0,3,0,5797
2282 publishEvent:@Node4 /root/0 NODE_ADDED [main-EventThread]
```
* The first column is time
* The second column is the operation
* The third is the TreeNode doing the operation (I made the name human
readable)
* The last bit is the event - wasDeleted means that wasDeleted() did not
publish NODE_REMOVED because the node state was not LIVE
The problem here is the two NODE_ADDED publishes at 2155 and 2282. I
believe there is a missing NODE_REMOVED event at 2260.
Note, at time 2136 we get watcher process() NodeDataChanged immediately
followed by a NodeDeleted on @Node1's watcher.
@Node1 then publishes NODE_REMOVED (correctly) at 2136.
At 2154, @root gets the result of calling getChildren. I assume that
"/root/0" was recreated and @root's watcher got notified. Note that
"0" is in the children list. So, this will cause a new node to get created
and its wasCreated() called. At 2155, the old @Node1 gets the
result of GET_DATA. This causes a false NODE_ADDED to get generated. @Node1
is officially dead and should not be reporting this.
I believe the sequence of events is: Just prior to 2136, 1 test code thread
changes "/root/0" and another deletes it and another recreates it almost
immediately.
@Node1's watcher is triggered for NodeDataChanged and begins an async call
to getData(), it then receives NodeDeleted BEFORE the async getData() occurs on
the server.
The node then gets recreated. The getData() finally executes on the server
and the result is returned such that @Node1 misses the node deletion.
Anyway, I have a potential fix for now. In processResults() GET_DATA case,
add:
```
if ( nodeState.get() == NodeState.DEAD )
{
break;
}
```
This fixes the problem. Longer term, we should consider moving node
management into the main TreeCache instance and not the individual node objects.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---