[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696437#comment-14696437
 ] 

Akihiro Suda commented on ZOOKEEPER-2080:
-----------------------------------------

The bug can be almost always reproduced by injecting 80 msecs delay to every 
FLE packets with my tool:
https://github.com/osrg/earthquake/tree/9078c5b039762f6c201ee036ac3453caf6168055/example/zk-repro-2080.nfqhook

When I comment out 
[{{Socket#setTcpNoDelay(true)}}|https://github.com/apache/zookeeper/blob/5b1b668d33ccf7d93c31db2a53728177393fea90/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java#L566]
 in {{QCnxM#setSockOpts()}}, the bug gets hard to be reproduced.

So I guess the bug is caused by a race condition in {{QCnxM}} (or in {{FLE}}).

Anyone can give us some advice about suspicious point in {{QCnxM}}?

ZOOKEEPER-2246 might be related to 2080, but just applying [this 
fix|https://issues.apache.org/jira/browse/ZOOKEEPER-2246?focusedCommentId=14694804&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14694804]
 proposed in 2246 does not resolve 2080.


> ReconfigRecoveryTest fails intermittently
> -----------------------------------------
>
>                 Key: ZOOKEEPER-2080
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2080
>             Project: ZooKeeper
>          Issue Type: Sub-task
>            Reporter: Ted Yu
>            Assignee: Raul Gutierrez Segales
>            Priority: Minor
>
> I got the following test failure on MacBook with trunk code:
> {code}
> Testcase: testCurrentObserverIsParticipantInNewConfig took 93.628 sec
>   FAILED
> waiting for server 2 being up
> junit.framework.AssertionFailedError: waiting for server 2 being up
>   at 
> org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentObserverIsParticipantInNewConfig(ReconfigRecoveryTest.java:529)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to