[
https://issues.apache.org/jira/browse/ZOOKEEPER-1424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231654#comment-13231654
]
Mihai Claudiu Toader commented on ZOOKEEPER-1424:
-------------------------------------------------
Right now i'm leaving for a trip but as soon as i get a computer i'll do that.
No later than 22'th March.
> ZooKeeper will not allow a client to delete a tree when it should allow it
> --------------------------------------------------------------------------
>
> Key: ZOOKEEPER-1424
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1424
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.4.2
> Environment: Linux ubuntu 11.10, Zookeeper 3.4.2, One server, Two
> Java clients
> Reporter: Mihai Claudiu Toader
>
> Hi all,
> While using zookeeper at midokura we hit an interesting bug in zookeeper. We
> did hit it sporadically
> while developing some functional tests so i had to build a test case for it.
> I finally created the test case and i think i narrowed down the conditions
> under which it happens.
> So i wanted to let you know my findings since they are somewhat troublesome.
> We need:
> - one running zookeeper server (didn't test that with a cluster)
> let's name this: server
> - one running zookeeper client that will create an ephemeral node under the
> tree created by the next client
> let's name this: the ephemeral client
> - one running zookeeper client that will create a persistent tree and try
> to delete that tree
> let's name this: the persistent client
> What needs to happen is this:
> step 1. - the server starts
> step 2. - the persistent client connects and creates a tree
> step 3. - the ephemeral client connects and adds a ephemeral node under the
> tree created by the persistent client
> step 4. - the persistent client will try to delete the tree recursively
> (without including the ephemeral node in the multi op
> step 5. - the ephemeral client crashes hard (the equivalent of kill -9)
> step 6. - the persistent client will try to delete the tree recursively
> again (and fail with NoEmptyNode even if when we list the node we don't see
> any childrens)
> - the zookeeper server needs to be restarted in order for this to work.
> The step 4 is critical in the sense that if we don't have that (there is no
> previous error trying to remove a tree) then the nexts steps behave as we
> would expect them to behave (aka pass).
> Also no amount of fiddling with zookeeper connection timeouts (between
> zookeeper and ephemeral node) will help.
>
> If the ephemeral client is shutdown properly it seems like everything will
> behave properly (even with step 4).
> The test code is available here:
> https://github.com/mtoadermido/play
> It needs an zookeepr 3.4.2 installed on the system (it uses the installed
> jars from the deb to spawn the zookeeper server).
> The entry point is
> https://github.com/mtoadermido/play/blob/master/src/main/java/com/midokura/tests/zookeeper/BlockingBug.java
> There is a lot of boiler plate since i didn't want it to be depending on
> stuff from midonet but the interesting part is the BlockingBug.main() method.
> It will launch a zookeeper process, an external ephemeral client process, and
> after that act as the second client.
> Available tweaks:
> - the zookeeper client timeout for the ephemeral client here:
>
> https://github.com/mtoadermido/play/blob/master/src/main/java/com/midokura/tests/zookeeper/BlockingBug.java#L56
> - the step 4 here (set to true / false):
>
> https://github.com/mtoadermido/play/blob/master/src/main/java/com/midokura/tests/zookeeper/BlockingBug.java#L69
> - the shutdown of the ephemeral client (soft aka clean shutdown, hard aka
> kill -9):
>
> https://github.com/mtoadermido/play/blob/master/src/main/java/com/midokura/tests/zookeeper/BlockingBug.java#L88
> The result is displayed depending on the fact that the final recursive
> deletion succeeded or not:
>
> We hit it !. The clear tree failed.
>
> https://github.com/mtoadermido/play/blob/master/src/main/java/com/midokura/tests/zookeeper/BlockingBug.java#L103
> "No error :("
>
> https://github.com/mtoadermido/play/blob/master/src/main/java/com/midokura/tests/zookeeper/BlockingBug.java#L99
> The conclusion is that the bug seems to be inside the zookeeper codebase and
> it's prone to being triggered by this
> particular usage of zookeeper combined with the misfortune of having to kill
> the ephemeral process hard.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira