[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15475408#comment-15475408
 ] 

Michael Han commented on ZOOKEEPER-2080:
----------------------------------------

Thanks for feedback, [~fpj]!

bq. Is it the case that the {{ synchronized(self) }} block only needs to cover 
up to {{self.processReconfig}}

I think from the way the code was implemented, the block is to cover not just 
the processReconfig, but also the {{if (!rqv.equals(curQV)) {}} following, and 
the entire block has to be a single atomic operation.

bq. If so, perhaps we should end the block there and process the following 
outside. 

Are you suggesting something like 
[this|https://github.com/hanm/zookeeper/blob/60bcf70b83605e0fd574a95fd00a32e7e763f6f7/src/java/main/org/apache/zookeeper/server/quorum/FastLeaderElection.java#L292]?
 Here I make the sync block only covers processReconfig. However there are a 
couple of reconfig tests failed deterministically because of such a change. I 
haven't figured out why those tests failed, but it could be the entire code 
block (processReconfig + the rest of what's the current sync covered currently) 
has to be covered such that the processReconfig and this has to be a 
transaction together:
{code}
                               if (!rqv.equals(curQV)) {
                                    LOG.info("restarting leader election");
                                    self.shuttingDownLE = true;
                                    self.getElectionAlg().shutdown();
                                    break;
                                }
{code}

So making the sync block only covers up to processReconfig probably would not 
work ... (I need to dig deeper into the reconfig logic - probably do some data 
flow analysis to figure out the dependencies:) - for now I can't prove on paper 
it's safe (or not) as for where the sync block ends (I am using existing tests 
as the verifier).

> ReconfigRecoveryTest fails intermittently
> -----------------------------------------
>
>                 Key: ZOOKEEPER-2080
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2080
>             Project: ZooKeeper
>          Issue Type: Sub-task
>            Reporter: Ted Yu
>            Assignee: Michael Han
>             Fix For: 3.5.3, 3.6.0
>
>         Attachments: ZOOKEEPER-2080.patch, ZOOKEEPER-2080.patch, 
> ZOOKEEPER-2080.patch, ZOOKEEPER-2080.patch, 
> jacoco-ZOOKEEPER-2080.unzip-grows-to-70MB.7z, repro-20150816.log, 
> threaddump.log
>
>
> I got the following test failure on MacBook with trunk code:
> {code}
> Testcase: testCurrentObserverIsParticipantInNewConfig took 93.628 sec
>   FAILED
> waiting for server 2 being up
> junit.framework.AssertionFailedError: waiting for server 2 being up
>   at 
> org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentObserverIsParticipantInNewConfig(ReconfigRecoveryTest.java:529)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to