[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marshall McMullen updated ZOOKEEPER-1124:
-----------------------------------------

    Attachment: ZOOKEEPER-1124.patch

Replacing earlier patch with a better one that contains a unit test that 
exhibits the fix works properly. The unit test essentially connects to a 
follower, then submits a multiop to the follower. It then verifies that the 
multiop succeeded properly. 

When I run this unit test WITHOUT the required fixes in 
FollowerRequestProcessor.java and ObserverRequestProcessor.java, then I get a 
nice failure that correctly replicates the failures I've seen in our 
integration of multi:

Testcase: testMultiToFollower took 28.451 sec                                   
                                                                                
                                                                                
                              
    Caused an ERROR                                                             
                                                                                
                                                                                
                              
KeeperErrorCode = ConnectionLoss                                                
                                                                                
                                                                                
                              
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss                                                                  
                                                                                
                              
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)     
                                                                                
                                                                                
                              
    at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:886)         
                                                                                
                                                                                
                              
    at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:876)                 
                                                                                
                                                                                
                              
    at 
org.apache.zookeeper.test.QuorumTest.testMultiToFollower(QuorumTest.java:89)    
                                                                                
                                                                                
                       
    at 
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
     

WITH the fixes in the included patch, the unit test passes correctly.

> Multiop submitted to non-leader always fails due to timeout
> -----------------------------------------------------------
>
>                 Key: ZOOKEEPER-1124
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1124
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.0
>         Environment: all
>            Reporter: Marshall McMullen
>            Priority: Critical
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1124.patch
>
>
> The new Multiop support added under zookeeper-965 fails every single time if 
> the multiop is submitted to a non-leader in quorum mode. In standalone mode 
> it always works properly and this bug only presents itself in quorum mode 
> (with 2 or more nodes). After 12 hours of debugging (*sigh*) it turns out to 
> be a really simple fix. There are a couple of missing case statements inside 
> FollowerRequestProcessor.java and ObserverRequestProcessor.java to ensure 
> that multiop is forwarded to the leader for commit. I've attached a patch 
> that fixes this problem.
> It's probably worth nothing that zookeeper-965 has already been committed to 
> trunk. But this is a fatal flaw that will prevent multiop support from 
> working properly and as such needs to get committed to 3.4.0 as well. Is 
> there a way to tie these two cases together in some way?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to