[
https://issues.apache.org/jira/browse/ZOOKEEPER-2916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16266130#comment-16266130
]
Bogdan Kanivets edited comment on ZOOKEEPER-2916 at 11/26/17 6:45 PM:
----------------------------------------------------------------------
I don't have the solution yet, but when comparing successful and failed runs
the problem seems to be around leader election after
{code:java}
startObservers(observerStrings);
testReconfig(follower2, true, reconfigServers); //add partcipants
testReconfig(follower2, true, observerStrings); //change to observers
{code}
Observers here are started as participants and take part in election, but later
they are converted to observers
Looking at failed run
https://builds.apache.org/job/ZooKeeper-trunk-openjdk7/1701/
When I filter by assigned ports
{code:java}
grep "1122[2-9]\|1123[0-6]" consoleFull-jdk7.html
{code}
after the second observer is up:
[junit] 2017-11-16 19:53:26,403 [myid:4] - INFO
[Thread-11:NIOServerCnxnFactory@686] - binding to port localhost/127.0.0.1:11234
there will be only one "Restarting Leader Election":
[junit] 2017-11-16 19:53:26,737 [myid:3] - WARN
[QuorumPeer[myid=3](plain=/127.0.0.1:11231)(secure=disabled):QuorumPeer@1427] -
Restarting Leader Election
then 20s later
[junit] 2017-11-16 19:53:46,715 [myid:3] - WARN
[localhost/127.0.0.1:11233:QuorumCnxManager@348] - Exception reading or writing
challenge: java.net.SocketTimeoutException: Read timed out
On the successful run:
https://builds.apache.org/job/ZooKeeper-trunk-openjdk7/1702
{code:java}
grep "2738[0-9]\|2739[0-4]" consoleFull-jdk7-success.html
{code}
after second observer start:
[junit] 2017-11-17 20:18:40,311 [myid:4] - INFO
[Thread-11:NIOServerCnxnFactory@686] - binding to port localhost/127.0.0.1:27392
There are leader election restarts from two peers
[junit] 2017-11-17 20:18:43,891 [myid:4] - WARN
[QuorumPeer[myid=4](plain=/127.0.0.1:27392)(secure=disabled):QuorumPeer@1427] -
Restarting Leader Election
[junit] 2017-11-17 20:18:43,894 [myid:3] - WARN
[QuorumPeer[myid=3](plain=/127.0.0.1:27389)(secure=disabled):QuorumPeer@1427] -
Restarting Leader Election
There is no "Read timed out", and test is done after 3s
[junit] 2017-11-17 20:18:46,133 [myid:] - INFO
[main:StandaloneDisabledTest@114] - Configuration after adding two observers:
[junit] server.2=localhost:27387:27388:participant;localhost:27386
[junit] server.3=localhost:27390:27391:observer;localhost:27389
[junit] server.4=localhost:27393:27394:observer;localhost:27392
was (Author: bkanivets):
I don't have the solution yet, but when comparing successful and failed runs
the problem seems to be around leader election after
{code:java}
startObservers(observerStrings);
testReconfig(follower2, true, reconfigServers); //add partcipants
testReconfig(follower2, true, observerStrings); //change to observers
{code}
Observers here are started as participants and take part in election, but later
they are converted to observers
Looking at failed run
https://builds.apache.org/job/ZooKeeper-trunk-openjdk7/1701/
When I filter by assigned ports
{code:java}
grep "1122[2-9]\|1123[0-6]" consoleFull-jdk7.html
{code}
after the second observer is up:
[junit] 2017-11-16 19:53:26,403 [myid:4] - INFO
[Thread-11:NIOServerCnxnFactory@686] - binding to port localhost/127.0.0.1:11234
there will be only one "Restarting Leader Election":
[junit] 2017-11-16 19:53:26,737 [myid:3] - WARN
[QuorumPeer[myid=3](plain=/127.0.0.1:11231)(secure=disabled):QuorumPeer@1427] -
Restarting Leader Election
then 20s later
[junit] 2017-11-16 19:53:46,715 [myid:3] - WARN
[localhost/127.0.0.1:11233:QuorumCnxManager@348] - Exception reading or writing
challenge: java.net.SocketTimeoutException: Read timed out
On the successful run:
https://builds.apache.org/job/ZooKeeper-trunk-openjdk7/1702
grep "2738[0-9]\|2739[0-4]" consoleFull-jdk7-success.html
after second observer start:
[junit] 2017-11-17 20:18:40,311 [myid:4] - INFO
[Thread-11:NIOServerCnxnFactory@686] - binding to port localhost/127.0.0.1:27392
There are leader election restarts from two peers
[junit] 2017-11-17 20:18:43,891 [myid:4] - WARN
[QuorumPeer[myid=4](plain=/127.0.0.1:27392)(secure=disabled):QuorumPeer@1427] -
Restarting Leader Election
[junit] 2017-11-17 20:18:43,894 [myid:3] - WARN
[QuorumPeer[myid=3](plain=/127.0.0.1:27389)(secure=disabled):QuorumPeer@1427] -
Restarting Leader Election
There is no "Read timed out", and test is done after 3s
[junit] 2017-11-17 20:18:46,133 [myid:] - INFO
[main:StandaloneDisabledTest@114] - Configuration after adding two observers:
[junit] server.2=localhost:27387:27388:participant;localhost:27386
[junit] server.3=localhost:27390:27391:observer;localhost:27389
[junit] server.4=localhost:27393:27394:observer;localhost:27392
> startSingleServerTest may be flaky
> ----------------------------------
>
> Key: ZOOKEEPER-2916
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2916
> Project: ZooKeeper
> Issue Type: Bug
> Components: tests
> Affects Versions: 3.5.3, 3.6.0
> Reporter: Patrick Hunt
> Assignee: Bogdan Kanivets
> Labels: newbie
>
> startSingleServerTest seems to be failing intermittently. 10 times in the
> first few days of this month. Can someone take a look?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)