[
https://issues.apache.org/jira/browse/ZOOKEEPER-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13789692#comment-13789692
]
Thawan Kooburat commented on ZOOKEEPER-1624:
--------------------------------------------
As I already comment earlier, the current Java test doesn't actually catch the
bug due to timing issue. I guess, I will have to rewrite it to test
PrepRequestProcessor directly (which is probably not going to rely on
ZOOKEEPER-1572)
If you want to commit this now, the patch itself has a proper and reliable (at
least on my box) unit test in C. Our test infrastructure do run C unit test
and report the result right? I agree with Camile that it would be nice to have
Java test for server-side functionality but it isn't strictly needed right?
> PrepRequestProcessor abort multi-operation incorrectly
> ------------------------------------------------------
>
> Key: ZOOKEEPER-1624
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1624
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Reporter: Thawan Kooburat
> Assignee: Thawan Kooburat
> Priority: Critical
> Labels: zk-review
> Fix For: 3.4.6, 3.5.0
>
> Attachments: ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch,
> ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch
>
>
> We found this issue when trying to issue multiple instances of the following
> multi-op concurrently
> multi {
> 1. create sequential node /a-
> 2. create node /b
> }
> The expected result is that only the first multi-op request should success
> and the rest of request should fail because /b is already exist
> However, the reported result is that the subsequence multi-op failed because
> of sequential node creation failed which is not possible.
> Below is the return code for each sub-op when issuing 3 instances of the
> above multi-op asynchronously
> 1. ZOK, ZOK
> 2. ZOK, ZNODEEXISTS,
> 3. ZNODEEXISTS, ZRUNTIMEINCONSISTENCY,
> When I added more debug log. The cause is that PrepRequestProcessor rollback
> outstandingChanges of the second multi-op incorrectly causing sequential node
> name generation to be incorrect. Below is the sequential node name generated
> by PrepRequestProcessor
> 1. create /a-0001
> 2. create /a-0003
> 3. create /a-0001
> The bug is getPendingChanges() method. In failed to copied ChangeRecord for
> the parent node ("/"). So rollbackPendingChanges() cannot restore the right
> previous change record of the parent node when aborting the second multi-op
> The impact of this bug is that sequential node creation on the same parent
> node may fail until the previous one is committed. I am not sure if there is
> other implication or not.
--
This message was sent by Atlassian JIRA
(v6.1#6144)