[ https://issues.apache.org/jira/browse/ZOOKEEPER-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696437#comment-14696437 ]
Akihiro Suda commented on ZOOKEEPER-2080: ----------------------------------------- The bug can be almost always reproduced by injecting 80 msecs delay to every FLE packets with my tool: https://github.com/osrg/earthquake/tree/9078c5b039762f6c201ee036ac3453caf6168055/example/zk-repro-2080.nfqhook When I comment out [{{Socket#setTcpNoDelay(true)}}|https://github.com/apache/zookeeper/blob/5b1b668d33ccf7d93c31db2a53728177393fea90/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java#L566] in {{QCnxM#setSockOpts()}}, the bug gets hard to be reproduced. So I guess the bug is caused by a race condition in {{QCnxM}} (or in {{FLE}}). Anyone can give us some advice about suspicious point in {{QCnxM}}? ZOOKEEPER-2246 might be related to 2080, but just applying [this fix|https://issues.apache.org/jira/browse/ZOOKEEPER-2246?focusedCommentId=14694804&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14694804] proposed in 2246 does not resolve 2080. > ReconfigRecoveryTest fails intermittently > ----------------------------------------- > > Key: ZOOKEEPER-2080 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2080 > Project: ZooKeeper > Issue Type: Sub-task > Reporter: Ted Yu > Assignee: Raul Gutierrez Segales > Priority: Minor > > I got the following test failure on MacBook with trunk code: > {code} > Testcase: testCurrentObserverIsParticipantInNewConfig took 93.628 sec > FAILED > waiting for server 2 being up > junit.framework.AssertionFailedError: waiting for server 2 being up > at > org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentObserverIsParticipantInNewConfig(ReconfigRecoveryTest.java:529) > at > org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)