[
https://issues.apache.org/jira/browse/ZOOKEEPER-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Marshall McMullen updated ZOOKEEPER-1124:
-----------------------------------------
Attachment: ZOOKEEPER-1124.patch
Replacing earlier patch with a better one that contains a unit test that
exhibits the fix works properly. The unit test essentially connects to a
follower, then submits a multiop to the follower. It then verifies that the
multiop succeeded properly.
When I run this unit test WITHOUT the required fixes in
FollowerRequestProcessor.java and ObserverRequestProcessor.java, then I get a
nice failure that correctly replicates the failures I've seen in our
integration of multi:
Testcase: testMultiToFollower took 28.451 sec
Caused an ERROR
KeeperErrorCode = ConnectionLoss
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode =
ConnectionLoss
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:886)
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:876)
at
org.apache.zookeeper.test.QuorumTest.testMultiToFollower(QuorumTest.java:89)
at
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
WITH the fixes in the included patch, the unit test passes correctly.
> Multiop submitted to non-leader always fails due to timeout
> -----------------------------------------------------------
>
> Key: ZOOKEEPER-1124
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1124
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.4.0
> Environment: all
> Reporter: Marshall McMullen
> Priority: Critical
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-1124.patch
>
>
> The new Multiop support added under zookeeper-965 fails every single time if
> the multiop is submitted to a non-leader in quorum mode. In standalone mode
> it always works properly and this bug only presents itself in quorum mode
> (with 2 or more nodes). After 12 hours of debugging (*sigh*) it turns out to
> be a really simple fix. There are a couple of missing case statements inside
> FollowerRequestProcessor.java and ObserverRequestProcessor.java to ensure
> that multiop is forwarded to the leader for commit. I've attached a patch
> that fixes this problem.
> It's probably worth nothing that zookeeper-965 has already been committed to
> trunk. But this is a fatal flaw that will prevent multiop support from
> working properly and as such needs to get committed to 3.4.0 as well. Is
> there a way to tie these two cases together in some way?
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira