[
https://issues.apache.org/jira/browse/ZOOKEEPER-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16538781#comment-16538781
]
Hadoop QA commented on ZOOKEEPER-3072:
--------------------------------------
-1 overall. GitHub Pull Request Build
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 2 new or modified tests.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac
compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1)
warnings.
+1 release audit. The applied patch does not increase the total number of
release audit warnings.
-1 core tests. The patch failed core unit tests.
+1 contrib tests. The patch passed contrib unit tests.
Test results:
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1920//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1920//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output:
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1920//console
This message is automatically generated.
> Race condition in throttling
> ----------------------------
>
> Key: ZOOKEEPER-3072
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3072
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.5.0, 3.5.1, 3.5.2, 3.5.3, 3.5.4
> Reporter: Botond Hejj
> Priority: Major
> Labels: pull-request-available
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> There is a race condition in the server throttling code. It is possible that
> the disableRecv is called after enableRecv.
> Basically, the I/O work thread does this in processPacket:
> [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java#L1102]
>
> submitRequest(si);
> }
> }
> cnxn.incrOutstandingRequests(h);
> }
>
> incrOutstandingRequests() checks for limit breach, and potentially turns on
> throttling,
> [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java#L384]
>
> submitRequest() will create a logical request and en-queue it so that
> Processor thread can pick it up. After being de-queued by Processor thread,
> it does necessary handling, and then calls this
> [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java#L459]
> :
>
> cnxn.sendResponse(hdr, rsp, "response");
>
> and in sendResponse(), it first appends to outgoing buffer, and then checks
> if un-throttle is needed:
> [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java#L708]
>
> However, if there is a context switch between submitRequest() and
> cnxn.incrOutstandingRequests(), so that Processor thread completes
> cnxn.sendResponse() call before I/O thread switches back, then enableRecv()
> will happen before disableRecv(), and enableRecv() will fail the CAS ops,
> while disableRecv() will succeed, resulting in a deadlock: un-throttle is
> needed for letting in requests, and sendResponse is needed to trigger
> un-throttle, but sendResponse() requires an incoming message. From that point
> on, ZK server will no longer select the affected client socket for read,
> leading to the observed client-side failure in the subject.
> If you would like to reproduce this than setting the globalOutstandingLimit
> down to 1 makes this reproducible easier as throttling starts with less
> requests.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)