[ https://issues.apache.org/jira/browse/ZOOKEEPER-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15475408#comment-15475408 ]
Michael Han commented on ZOOKEEPER-2080: ---------------------------------------- Thanks for feedback, [~fpj]! bq. Is it the case that the {{ synchronized(self) }} block only needs to cover up to {{self.processReconfig}} I think from the way the code was implemented, the block is to cover not just the processReconfig, but also the {{if (!rqv.equals(curQV)) {}} following, and the entire block has to be a single atomic operation. bq. If so, perhaps we should end the block there and process the following outside. Are you suggesting something like [this|https://github.com/hanm/zookeeper/blob/60bcf70b83605e0fd574a95fd00a32e7e763f6f7/src/java/main/org/apache/zookeeper/server/quorum/FastLeaderElection.java#L292]? Here I make the sync block only covers processReconfig. However there are a couple of reconfig tests failed deterministically because of such a change. I haven't figured out why those tests failed, but it could be the entire code block (processReconfig + the rest of what's the current sync covered currently) has to be covered such that the processReconfig and this has to be a transaction together: {code} if (!rqv.equals(curQV)) { LOG.info("restarting leader election"); self.shuttingDownLE = true; self.getElectionAlg().shutdown(); break; } {code} So making the sync block only covers up to processReconfig probably would not work ... (I need to dig deeper into the reconfig logic - probably do some data flow analysis to figure out the dependencies:) - for now I can't prove on paper it's safe (or not) as for where the sync block ends (I am using existing tests as the verifier). > ReconfigRecoveryTest fails intermittently > ----------------------------------------- > > Key: ZOOKEEPER-2080 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2080 > Project: ZooKeeper > Issue Type: Sub-task > Reporter: Ted Yu > Assignee: Michael Han > Fix For: 3.5.3, 3.6.0 > > Attachments: ZOOKEEPER-2080.patch, ZOOKEEPER-2080.patch, > ZOOKEEPER-2080.patch, ZOOKEEPER-2080.patch, > jacoco-ZOOKEEPER-2080.unzip-grows-to-70MB.7z, repro-20150816.log, > threaddump.log > > > I got the following test failure on MacBook with trunk code: > {code} > Testcase: testCurrentObserverIsParticipantInNewConfig took 93.628 sec > FAILED > waiting for server 2 being up > junit.framework.AssertionFailedError: waiting for server 2 being up > at > org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentObserverIsParticipantInNewConfig(ReconfigRecoveryTest.java:529) > at > org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)