[
https://issues.apache.org/jira/browse/ZOOKEEPER-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14059935#comment-14059935
]
Hadoop QA commented on ZOOKEEPER-1863:
--------------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12655418/ZOOKEEPER-1863.patch
against trunk revision 1609730.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 5 new or modified tests.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac
compiler warnings.
-1 findbugs. The patch appears to introduce 87 new Findbugs (version
2.0.3) warnings.
+1 release audit. The applied patch does not increase the total number of
release audit warnings.
-1 core tests. The patch failed core unit tests.
+1 contrib tests. The patch passed contrib unit tests.
Test results:
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2188//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2188//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output:
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2188//console
This message is automatically generated.
> Race condition in commit processor leading to out of order request
> completion, xid mismatch on client.
> ------------------------------------------------------------------------------------------------------
>
> Key: ZOOKEEPER-1863
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1863
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.5.0
> Reporter: Dutch T. Meyer
> Assignee: Dutch T. Meyer
> Priority: Blocker
> Fix For: 3.5.0
>
> Attachments: ZOOKEEPER-1863.patch, ZOOKEEPER-1863.patch,
> ZOOKEEPER-1863.patch, ZOOKEEPER-1863.patch, ZOOKEEPER-1863.patch,
> ZOOKEEPER-1863.patch, stack.17512
>
>
> In CommitProcessor.java processor, if we are at the primary request handler
> on line 167:
> {noformat}
> while (!stopped && !isWaitingForCommit() &&
> !isProcessingCommit() &&
> (request = queuedRequests.poll()) != null) {
> if (needCommit(request)) {
> nextPending.set(request);
> } else {
> sendToNextProcessor(request);
> }
> }
> {noformat}
> A request can be handled in this block and be quickly processed and completed
> on another thread. If queuedRequests is empty, we then exit the block. Next,
> before this thread makes any more progress, we can get 2 more requests, one
> get_children(say), and a sync placed on queuedRequests for the processor.
> Then, if we are very unlucky, the sync request can complete and this object's
> commit() routine is called (from FollowerZookeeperServer), which places the
> sync request on the previously empty committedRequests queue. At that point,
> this thread continues.
> We reach line 182, which is a check on sync requests.
> {noformat}
> if (!stopped && !isProcessingRequest() &&
> (request = committedRequests.poll()) != null) {
> {noformat}
> Here we are not processing any requests, because the original request has
> completed. We haven't dequeued either the read or the sync request in this
> processor. Next, the poll above will pull the sync request off the queue, and
> in the following block, the sync will get forwarded to the next processor.
> This is a problem because the read request hasn't been forwarded yet, so
> requests are now out of order.
> I've been able to reproduce this bug reliably by injecting a
> Thread.sleep(5000) between the two blocks above to make the race condition
> far more likely, then in a client program.
> {noformat}
> zoo_aget_children(zh, "/", 0, getchildren_cb, NULL);
> //Wait long enough for queuedRequests to drain
> sleep(1);
> zoo_aget_children(zh, "/", 0, getchildren_cb, &th_ctx[0]);
> zoo_async(zh, "/", sync_cb, &th_ctx[0]);
> {noformat}
> When this bug is triggered, 3 things can happen:
> 1) Clients will see requests complete out of order and fail on xid mismatches.
> 2) Kazoo in particular doesn't handle this runtime exception well, and can
> orphan outstanding requests.
> 3) I've seen zookeeper servers deadlock, likely because the commit cannot be
> completed, which can wedge the commit processor.
--
This message was sent by Atlassian JIRA
(v6.2#6252)