[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16266130#comment-16266130
 ] 

Bogdan Kanivets edited comment on ZOOKEEPER-2916 at 11/26/17 6:43 PM:
----------------------------------------------------------------------

I don't have the solution yet, but when comparing successful and failed runs 
the problem seems to be around leader election after 

{code:java}
startObservers(observerStrings);
testReconfig(follower2, true, reconfigServers); //add partcipants
testReconfig(follower2, true, observerStrings); //change to observers
{code}

Observers here are started as participants and take part in election, but later 
they are converted to observers

Looking at failed run
https://builds.apache.org/job/ZooKeeper-trunk-openjdk7/1701/

When I filter by assigned ports 

{code:java}
grep "1122[2-9]\|1123[0-6]" consoleFull-jdk7.html
{code}


after the second observer is up:
[junit] 2017-11-16 19:53:26,403 [myid:4] - INFO  
[Thread-11:NIOServerCnxnFactory@686] - binding to port localhost/127.0.0.1:11234

there will be only one "Restarting Leader Election":
[junit] 2017-11-16 19:53:26,737 [myid:3] - WARN  
[QuorumPeer[myid=3](plain=/127.0.0.1:11231)(secure=disabled):QuorumPeer@1427] - 
Restarting Leader Election
then 20s later 
[junit] 2017-11-16 19:53:46,715 [myid:3] - WARN  
[localhost/127.0.0.1:11233:QuorumCnxManager@348] - Exception reading or writing 
challenge: java.net.SocketTimeoutException: Read timed out

On the successful run:
https://builds.apache.org/job/ZooKeeper-trunk-openjdk7/1702
grep "2738[0-9]\|2739[0-4]" consoleFull-jdk7-success.html

after second observer start:
[junit] 2017-11-17 20:18:40,311 [myid:4] - INFO  
[Thread-11:NIOServerCnxnFactory@686] - binding to port localhost/127.0.0.1:27392

There are leader election restarts from two peers
[junit] 2017-11-17 20:18:43,891 [myid:4] - WARN  
[QuorumPeer[myid=4](plain=/127.0.0.1:27392)(secure=disabled):QuorumPeer@1427] - 
Restarting Leader Election
[junit] 2017-11-17 20:18:43,894 [myid:3] - WARN  
[QuorumPeer[myid=3](plain=/127.0.0.1:27389)(secure=disabled):QuorumPeer@1427] - 
Restarting Leader Election

There is no "Read timed out", and test is done after 3s
[junit] 2017-11-17 20:18:46,133 [myid:] - INFO  
[main:StandaloneDisabledTest@114] - Configuration after adding two observers:
[junit] server.2=localhost:27387:27388:participant;localhost:27386
[junit] server.3=localhost:27390:27391:observer;localhost:27389
[junit] server.4=localhost:27393:27394:observer;localhost:27392




was (Author: bkanivets):
I don't have the solution yet, but when comparing successful and failed runs 
the problem seems to be around leader election after 

{code:java}
startObservers(observerStrings);
testReconfig(follower2, true, reconfigServers); //add partcipants
testReconfig(follower2, true, observerStrings); //change to observers
{code}

Observers here are started as participants and take part in election, but later 
they are converted to observers

Looking at failed run
https://builds.apache.org/job/ZooKeeper-trunk-openjdk7/1701/

When I filter by assigned ports 
grep "1122[2-9]\|1123[0-6]" consoleFull-jdk7.html

after the second observer is up:
[junit] 2017-11-16 19:53:26,403 [myid:4] - INFO  
[Thread-11:NIOServerCnxnFactory@686] - binding to port localhost/127.0.0.1:11234

there will be only one "Restarting Leader Election":
[junit] 2017-11-16 19:53:26,737 [myid:3] - WARN  
[QuorumPeer[myid=3](plain=/127.0.0.1:11231)(secure=disabled):QuorumPeer@1427] - 
Restarting Leader Election
then 20s later 
[junit] 2017-11-16 19:53:46,715 [myid:3] - WARN  
[localhost/127.0.0.1:11233:QuorumCnxManager@348] - Exception reading or writing 
challenge: java.net.SocketTimeoutException: Read timed out

On the successful run:
https://builds.apache.org/job/ZooKeeper-trunk-openjdk7/1702
grep "2738[0-9]\|2739[0-4]" consoleFull-jdk7-success.html

after second observer start:
[junit] 2017-11-17 20:18:40,311 [myid:4] - INFO  
[Thread-11:NIOServerCnxnFactory@686] - binding to port localhost/127.0.0.1:27392

There are leader election restarts from two peers
[junit] 2017-11-17 20:18:43,891 [myid:4] - WARN  
[QuorumPeer[myid=4](plain=/127.0.0.1:27392)(secure=disabled):QuorumPeer@1427] - 
Restarting Leader Election
[junit] 2017-11-17 20:18:43,894 [myid:3] - WARN  
[QuorumPeer[myid=3](plain=/127.0.0.1:27389)(secure=disabled):QuorumPeer@1427] - 
Restarting Leader Election

There is no "Read timed out", and test is done after 3s
[junit] 2017-11-17 20:18:46,133 [myid:] - INFO  
[main:StandaloneDisabledTest@114] - Configuration after adding two observers:
[junit] server.2=localhost:27387:27388:participant;localhost:27386
[junit] server.3=localhost:27390:27391:observer;localhost:27389
[junit] server.4=localhost:27393:27394:observer;localhost:27392



> startSingleServerTest may be flaky
> ----------------------------------
>
>                 Key: ZOOKEEPER-2916
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2916
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: tests
>    Affects Versions: 3.5.3, 3.6.0
>            Reporter: Patrick Hunt
>            Assignee: Bogdan Kanivets
>              Labels: newbie
>
> startSingleServerTest seems to be failing intermittently. 10 times in the 
> first few days of this month. Can someone take a look?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to