Benno Evers created MESOS-9467:
----------------------------------
Summary:
ZooKeeperMasterContenderDetectorTest.MasterDetectorExpireSlaveZKSessionNewMaster
is flaky
Key: MESOS-9467
URL: https://issues.apache.org/jira/browse/MESOS-9467
Project: Mesos
Issue Type: Bug
Environment: Mac OSX with ssl enabled
Reporter: Benno Evers
The following error was observed in an internal CI run:
{noformat}
../../src/tests/master_contender_detector_tests.cpp:872: Failure
Failed to wait 15secs for detected
{noformat}
Full log:
{noformat}
[ RUN ]
ZooKeeperMasterContenderDetectorTest.MasterDetectorExpireSlaveZKSessionNewMaster
I1210 16:18:13.068011 338650560 zookeeper_test_server.cpp:156] Started
ZooKeeperTestServer on port 54990
2018-12-10 16:18:13,068:28813(0x70000e2f6000):ZOO_INFO@log_env@753: Client
environment:zookeeper.version=zookeeper C client 3.4.8
2018-12-10 16:18:13,068:28813(0x70000e2f6000):ZOO_INFO@log_env@757: Client
environment:host.name=Jenkinss-Mac-mini.local
2018-12-10 16:18:13,068:28813(0x70000e2f6000):ZOO_INFO@log_env@764: Client
environment:os.name=Darwin
2018-12-10 16:18:13,068:28813(0x70000e2f6000):ZOO_INFO@log_env@765: Client
environment:os.arch=18.2.0
2018-12-10 16:18:13,068:28813(0x70000e2f6000):ZOO_INFO@log_env@766: Client
environment:os.version=Darwin Kernel Version 18.2.0: Mon Nov 12 20:24:46 PST
2018; root:xnu-4903.231.4~2/RELEASE_X86_64
2018-12-10 16:18:13,068:28813(0x70000e2f6000):ZOO_INFO@log_env@774: Client
environment:user.name=jenkins
2018-12-10 16:18:13,068:28813(0x70000e2f6000):ZOO_INFO@log_env@782: Client
environment:user.home=/Users/jenkins
2018-12-10 16:18:13,068:28813(0x70000e2f6000):ZOO_INFO@log_env@794: Client
environment:user.dir=/Users/jenkins/workspace/workspace/mesos/Mesos_CI-build/FLAG/SSL/label/mac/mesos/build
2018-12-10 16:18:13,068:28813(0x70000e2f6000):ZOO_INFO@zookeeper_init@827:
Initiating client connection, host=127.0.0.1:54990 sessionTimeout=10000
watcher=0x116d03e00 sessionId=0 sessionPasswd=<null> context=0x7fd3883958d0
flags=0
2018-12-10 16:18:13,068:28813(0x70000ed1d000):ZOO_INFO@check_events@1764:
initiated connection to server [127.0.0.1:54990]
I1210 16:18:13.069262 236376064 contender.cpp:152] Joining the ZK group
2018-12-10 16:18:13,070:28813(0x70000ed1d000):ZOO_INFO@check_events@1811:
session establishment complete on server [127.0.0.1:54990],
sessionId=0x1679aa0ddc90000, negotiated timeout=10000
I1210 16:18:13.070789 239058944 group.cpp:341] Group process
(zookeeper-group(28)@10.0.49.4:54069) connected to ZooKeeper
I1210 16:18:13.070853 239058944 group.cpp:831] Syncing group operations: queue
size (joins, cancels, datas) = (1, 0, 0)
I1210 16:18:13.070868 239058944 group.cpp:419] Trying to create path '/mesos'
in ZooKeeper
I1210 16:18:13.073835 235839488 contender.cpp:268] New candidate (id='0') has
entered the contest for leadership
I1210 16:18:13.074319 237985792 detector.cpp:152] Detected a new leader:
(id='0')
I1210 16:18:13.074406 237449216 group.cpp:700] Trying to get
'/mesos/json.info_0000000000' in ZooKeeper
I1210 16:18:13.075139 239058944 zookeeper.cpp:262] A new leading master
([email protected]:10000) is detected
2018-12-10 16:18:13,075:28813(0x70000e273000):ZOO_INFO@log_env@753: Client
environment:zookeeper.version=zookeeper C client 3.4.8
2018-12-10 16:18:13,075:28813(0x70000e273000):ZOO_INFO@log_env@757: Client
environment:host.name=Jenkinss-Mac-mini.local
2018-12-10 16:18:13,075:28813(0x70000e273000):ZOO_INFO@log_env@764: Client
environment:os.name=Darwin
2018-12-10 16:18:13,075:28813(0x70000e273000):ZOO_INFO@log_env@765: Client
environment:os.arch=18.2.0
2018-12-10 16:18:13,075:28813(0x70000e273000):ZOO_INFO@log_env@766: Client
environment:os.version=Darwin Kernel Version 18.2.0: Mon Nov 12 20:24:46 PST
2018; root:xnu-4903.231.4~2/RELEASE_X86_64
2018-12-10 16:18:13,075:28813(0x70000e273000):ZOO_INFO@log_env@774: Client
environment:user.name=jenkins
2018-12-10 16:18:13,075:28813(0x70000e273000):ZOO_INFO@log_env@782: Client
environment:user.home=/Users/jenkins
2018-12-10 16:18:13,075:28813(0x70000e273000):ZOO_INFO@log_env@794: Client
environment:user.dir=/Users/jenkins/workspace/workspace/mesos/Mesos_CI-build/FLAG/SSL/label/mac/mesos/build
2018-12-10 16:18:13,075:28813(0x70000e273000):ZOO_INFO@zookeeper_init@827:
Initiating client connection, host=127.0.0.1:54990 sessionTimeout=10000
watcher=0x116d03e00 sessionId=0 sessionPasswd=<null> context=0x7fd3886b40e0
flags=0
2018-12-10 16:18:13,075:28813(0x70000f944000):ZOO_INFO@check_events@1764:
initiated connection to server [127.0.0.1:54990]
I1210 16:18:13.076236 238522368 contender.cpp:152] Joining the ZK group
2018-12-10 16:18:13,077:28813(0x70000f944000):ZOO_INFO@check_events@1811:
session establishment complete on server [127.0.0.1:54990],
sessionId=0x1679aa0ddc90001, negotiated timeout=10000
I1210 16:18:13.077278 239058944 group.cpp:341] Group process
(zookeeper-group(29)@10.0.49.4:54069) connected to ZooKeeper
I1210 16:18:13.077311 239058944 group.cpp:831] Syncing group operations: queue
size (joins, cancels, datas) = (1, 0, 0)
I1210 16:18:13.077327 239058944 group.cpp:419] Trying to create path '/mesos'
in ZooKeeper
I1210 16:18:13.079026 236912640 contender.cpp:268] New candidate (id='1') has
entered the contest for leadership
I1210 16:18:13.079605 237449216 detector.cpp:152] Detected a new leader:
(id='0')
I1210 16:18:13.079732 236912640 group.cpp:700] Trying to get
'/mesos/json.info_0000000000' in ZooKeeper
I1210 16:18:13.080291 237449216 zookeeper.cpp:262] A new leading master
([email protected]:10000) is detected
2018-12-10 16:18:13,080:28813(0x70000e3fc000):ZOO_INFO@log_env@753: Client
environment:zookeeper.version=zookeeper C client 3.4.8
2018-12-10 16:18:13,080:28813(0x70000e3fc000):ZOO_INFO@log_env@757: Client
environment:host.name=Jenkinss-Mac-mini.local
2018-12-10 16:18:13,080:28813(0x70000e3fc000):ZOO_INFO@log_env@764: Client
environment:os.name=Darwin
2018-12-10 16:18:13,080:28813(0x70000e3fc000):ZOO_INFO@log_env@765: Client
environment:os.arch=18.2.0
2018-12-10 16:18:13,080:28813(0x70000e3fc000):ZOO_INFO@log_env@766: Client
environment:os.version=Darwin Kernel Version 18.2.0: Mon Nov 12 20:24:46 PST
2018; root:xnu-4903.231.4~2/RELEASE_X86_64
2018-12-10 16:18:13,080:28813(0x70000e3fc000):ZOO_INFO@log_env@774: Client
environment:user.name=jenkins
2018-12-10 16:18:13,080:28813(0x70000e3fc000):ZOO_INFO@log_env@782: Client
environment:user.home=/Users/jenkins
2018-12-10 16:18:13,080:28813(0x70000e3fc000):ZOO_INFO@log_env@794: Client
environment:user.dir=/Users/jenkins/workspace/workspace/mesos/Mesos_CI-build/FLAG/SSL/label/mac/mesos/build
2018-12-10 16:18:13,080:28813(0x70000e3fc000):ZOO_INFO@zookeeper_init@827:
Initiating client connection, host=127.0.0.1:54990 sessionTimeout=10000
watcher=0x116d03e00 sessionId=0 sessionPasswd=<null> context=0x7fd388623260
flags=0
2018-12-10 16:18:13,081:28813(0x70000fa4a000):ZOO_INFO@check_events@1764:
initiated connection to server [127.0.0.1:54990]
2018-12-10 16:18:13,082:28813(0x70000fa4a000):ZOO_INFO@check_events@1811:
session establishment complete on server [127.0.0.1:54990],
sessionId=0x1679aa0ddc90002, negotiated timeout=10000
I1210 16:18:13.082370 239058944 group.cpp:341] Group process
(zookeeper-group(30)@10.0.49.4:54069) connected to ZooKeeper
I1210 16:18:13.082406 239058944 group.cpp:831] Syncing group operations: queue
size (joins, cancels, datas) = (0, 0, 0)
I1210 16:18:13.082422 239058944 group.cpp:419] Trying to create path '/mesos'
in ZooKeeper
I1210 16:18:13.083561 235302912 detector.cpp:152] Detected a new leader:
(id='0')
I1210 16:18:13.083673 237985792 group.cpp:700] Trying to get
'/mesos/json.info_0000000000' in ZooKeeper
I1210 16:18:13.084331 235302912 zookeeper.cpp:262] A new leading master
([email protected]:10000) is detected
2018-12-10
16:18:13,085:28813(0x70000fa4a000):ZOO_ERROR@handle_socket_error_msg@1782:
Socket [127.0.0.1:54990] zk retcode=-4, errno=64(Host is down): failed while
receiving a server response
I1210 16:18:13.085988 236376064 group.cpp:452] Lost connection to ZooKeeper,
attempting to reconnect ...
2018-12-10
16:18:13,086:28813(0x70000ed1d000):ZOO_ERROR@handle_socket_error_msg@1782:
Socket [127.0.0.1:54990] zk retcode=-4, errno=64(Host is down): failed while
receiving a server response
I1210 16:18:13.086808 236912640 group.cpp:452] Lost connection to ZooKeeper,
attempting to reconnect ...
I1210 16:18:13.087200 236376064 detector.cpp:140] The current leader (id=0) is
lost
I1210 16:18:13.087239 236376064 detector.cpp:152] Detected a new leader:
(id='1')
I1210 16:18:13.087339 237985792 group.cpp:700] Trying to get
'/mesos/json.info_0000000001' in ZooKeeper
I1210 16:18:13.087929 236376064 zookeeper.cpp:262] A new leading master
([email protected]:10001) is detected
W1210 16:18:23.086897 239058944 group.cpp:495] Timed out waiting to connect to
ZooKeeper. Forcing ZooKeeper session (sessionId=1679aa0ddc90002) expiration
I1210 16:18:23.087049 239058944 group.cpp:511] ZooKeeper session expired
I1210 16:18:23.087095 235302912 detector.cpp:140] The current leader (id=0) is
lost
I1210 16:18:23.087132 235302912 detector.cpp:152] Detected a new leader: None
2018-12-10 16:18:23,087:28813(0x70000e3fc000):ZOO_INFO@zookeeper_close@2579:
Freeing zookeeper resources for sessionId=0x1679aa0ddc90002
2018-12-10 16:18:23,087:28813(0x70000e067000):ZOO_INFO@log_env@753: Client
environment:zookeeper.version=zookeeper C client 3.4.8
2018-12-10 16:18:23,087:28813(0x70000e067000):ZOO_INFO@log_env@757: Client
environment:host.name=Jenkinss-Mac-mini.local
2018-12-10 16:18:23,087:28813(0x70000e067000):ZOO_INFO@log_env@764: Client
environment:os.name=Darwin
2018-12-10 16:18:23,087:28813(0x70000e067000):ZOO_INFO@log_env@765: Client
environment:os.arch=18.2.0
2018-12-10 16:18:23,087:28813(0x70000e067000):ZOO_INFO@log_env@766: Client
environment:os.version=Darwin Kernel Version 18.2.0: Mon Nov 12 20:24:46 PST
2018; root:xnu-4903.231.4~2/RELEASE_X86_64
2018-12-10 16:18:23,087:28813(0x70000e067000):ZOO_INFO@log_env@774: Client
environment:user.name=jenkins
2018-12-10 16:18:23,087:28813(0x70000e067000):ZOO_INFO@log_env@782: Client
environment:user.home=/Users/jenkins
2018-12-10 16:18:23,087:28813(0x70000e067000):ZOO_INFO@log_env@794: Client
environment:user.dir=/Users/jenkins/workspace/workspace/mesos/Mesos_CI-build/FLAG/SSL/label/mac/mesos/build
2018-12-10 16:18:23,087:28813(0x70000e067000):ZOO_INFO@zookeeper_init@827:
Initiating client connection, host=127.0.0.1:54990 sessionTimeout=10000
watcher=0x116d03e00 sessionId=0 sessionPasswd=<null> context=0x7fd388623260
flags=0
2018-12-10 16:18:23,087:28813(0x70000f4b5000):ZOO_INFO@check_events@1764:
initiated connection to server [127.0.0.1:54990]
W1210 16:18:23.087962 235302912 group.cpp:495] Timed out waiting to connect to
ZooKeeper. Forcing ZooKeeper session (sessionId=1679aa0ddc90000) expiration
I1210 16:18:23.088090 235302912 group.cpp:511] ZooKeeper session expired
I1210 16:18:23.088133 237449216 detector.cpp:140] The current leader (id=0) is
lost
I1210 16:18:23.088150 237449216 detector.cpp:152] Detected a new leader: None
I1210 16:18:23.088217 237985792 contender.cpp:217] Membership cancelled: 0
2018-12-10 16:18:23,088:28813(0x70000e067000):ZOO_INFO@zookeeper_close@2579:
Freeing zookeeper resources for sessionId=0x1679aa0ddc90000
2018-12-10 16:18:23,088:28813(0x70000e16d000):ZOO_INFO@log_env@753: Client
environment:zookeeper.version=zookeeper C client 3.4.8
2018-12-10 16:18:23,088:28813(0x70000e16d000):ZOO_INFO@log_env@757: Client
environment:host.name=Jenkinss-Mac-mini.local
2018-12-10 16:18:23,088:28813(0x70000e16d000):ZOO_INFO@log_env@764: Client
environment:os.name=Darwin
2018-12-10 16:18:23,088:28813(0x70000e16d000):ZOO_INFO@log_env@765: Client
environment:os.arch=18.2.0
2018-12-10 16:18:23,088:28813(0x70000e16d000):ZOO_INFO@log_env@766: Client
environment:os.version=Darwin Kernel Version 18.2.0: Mon Nov 12 20:24:46 PST
2018; root:xnu-4903.231.4~2/RELEASE_X86_64
2018-12-10 16:18:23,088:28813(0x70000e16d000):ZOO_INFO@log_env@774: Client
environment:user.name=jenkins
2018-12-10 16:18:23,088:28813(0x70000e16d000):ZOO_INFO@log_env@782: Client
environment:user.home=/Users/jenkins
2018-12-10 16:18:23,088:28813(0x70000e16d000):ZOO_INFO@log_env@794: Client
environment:user.dir=/Users/jenkins/workspace/workspace/mesos/Mesos_CI-build/FLAG/SSL/label/mac/mesos/build
2018-12-10 16:18:23,088:28813(0x70000e16d000):ZOO_INFO@zookeeper_init@827:
Initiating client connection, host=127.0.0.1:54990 sessionTimeout=10000
watcher=0x116d03e00 sessionId=0 sessionPasswd=<null> context=0x7fd3885d9a10
flags=0
2018-12-10 16:18:23,088:28813(0x70000ed1d000):ZOO_INFO@check_events@1764:
initiated connection to server [127.0.0.1:54990]
2018-12-10 16:18:23,165:28813(0x70000f944000):ZOO_WARN@zookeeper_interest@1597:
Exceeded deadline by 6746ms
2018-12-10
16:18:23,166:28813(0x70000f944000):ZOO_ERROR@handle_socket_error_msg@1702:
Socket [127.0.0.1:54990] zk retcode=-7, errno=60(Operation timed out):
connection to 127.0.0.1:54990 timed out (exceeded timeout by 3412ms)
2018-12-10 16:18:23,166:28813(0x70000f944000):ZOO_WARN@zookeeper_interest@1597:
Exceeded deadline by 6746ms
I1210 16:18:23.166213 236376064 group.cpp:452] Lost connection to ZooKeeper,
attempting to reconnect ...
2018-12-10 16:18:25,268:28813(0x70000f4b5000):ZOO_INFO@check_events@1811:
session establishment complete on server [127.0.0.1:54990],
sessionId=0x1679aa0ddc90003, negotiated timeout=10000
I1210 16:18:25.268710 237449216 group.cpp:341] Group process
(zookeeper-group(30)@10.0.49.4:54069) connected to ZooKeeper
I1210 16:18:25.268859 237449216 group.cpp:831] Syncing group operations: queue
size (joins, cancels, datas) = (0, 0, 0)
I1210 16:18:25.268879 237449216 group.cpp:419] Trying to create path '/mesos'
in ZooKeeper
2018-12-10 16:18:25,269:28813(0x70000ed1d000):ZOO_INFO@check_events@1811:
session establishment complete on server [127.0.0.1:54990],
sessionId=0x1679aa0ddc90004, negotiated timeout=10000
I1210 16:18:25.269650 239058944 group.cpp:341] Group process
(zookeeper-group(28)@10.0.49.4:54069) connected to ZooKeeper
I1210 16:18:25.269680 239058944 group.cpp:831] Syncing group operations: queue
size (joins, cancels, datas) = (0, 0, 0)
I1210 16:18:25.269692 239058944 group.cpp:419] Trying to create path '/mesos'
in ZooKeeper
2018-12-10 16:18:26,509:28813(0x70000f944000):ZOO_WARN@zookeeper_interest@1597:
Exceeded deadline by 10088ms
2018-12-10 16:18:26,509:28813(0x70000f944000):ZOO_INFO@check_events@1764:
initiated connection to server [127.0.0.1:54990]
2018-12-10
16:18:26,510:28813(0x70000f944000):ZOO_ERROR@handle_socket_error_msg@1800:
Socket [127.0.0.1:54990] zk retcode=-112, errno=70(Stale NFS file handle):
sessionId=0x1679aa0ddc90001 has expired.
I1210 16:18:26.510257 237985792 group.cpp:511] ZooKeeper session expired
I1210 16:18:26.510342 236376064 detector.cpp:140] The current leader (id=1) is
lost
I1210 16:18:26.510363 236376064 detector.cpp:152] Detected a new leader: None
I1210 16:18:26.510382 235839488 contender.cpp:217] Membership cancelled: 1
2018-12-10 16:18:26,510:28813(0x70000e2f6000):ZOO_INFO@zookeeper_close@2579:
Freeing zookeeper resources for sessionId=0x1679aa0ddc90001
2018-12-10 16:18:26,510:28813(0x70000e3fc000):ZOO_INFO@log_env@753: Client
environment:zookeeper.version=zookeeper C client 3.4.8
2018-12-10 16:18:26,510:28813(0x70000e3fc000):ZOO_INFO@log_env@757: Client
environment:host.name=Jenkinss-Mac-mini.local
2018-12-10 16:18:26,510:28813(0x70000e3fc000):ZOO_INFO@log_env@764: Client
environment:os.name=Darwin
2018-12-10 16:18:26,510:28813(0x70000e3fc000):ZOO_INFO@log_env@765: Client
environment:os.arch=18.2.0
2018-12-10 16:18:26,510:28813(0x70000e3fc000):ZOO_INFO@log_env@766: Client
environment:os.version=Darwin Kernel Version 18.2.0: Mon Nov 12 20:24:46 PST
2018; root:xnu-4903.231.4~2/RELEASE_X86_64
2018-12-10 16:18:26,510:28813(0x70000e3fc000):ZOO_INFO@log_env@774: Client
environment:user.name=jenkins
2018-12-10 16:18:26,510:28813(0x70000e3fc000):ZOO_INFO@log_env@782: Client
environment:user.home=/Users/jenkins
2018-12-10 16:18:26,510:28813(0x70000e3fc000):ZOO_INFO@log_env@794: Client
environment:user.dir=/Users/jenkins/workspace/workspace/mesos/Mesos_CI-build/FLAG/SSL/label/mac/mesos/build
2018-12-10 16:18:26,510:28813(0x70000e3fc000):ZOO_INFO@zookeeper_init@827:
Initiating client connection, host=127.0.0.1:54990 sessionTimeout=10000
watcher=0x116d03e00 sessionId=0 sessionPasswd=<null> context=0x7fd3885d9bc0
flags=0
2018-12-10 16:18:26,511:28813(0x70000f944000):ZOO_INFO@check_events@1764:
initiated connection to server [127.0.0.1:54990]
2018-12-10 16:18:26,512:28813(0x70000f944000):ZOO_INFO@check_events@1811:
session establishment complete on server [127.0.0.1:54990],
sessionId=0x1679aa0ddc90005, negotiated timeout=10000
I1210 16:18:26.512154 239058944 group.cpp:341] Group process
(zookeeper-group(29)@10.0.49.4:54069) connected to ZooKeeper
I1210 16:18:26.512214 239058944 group.cpp:831] Syncing group operations: queue
size (joins, cancels, datas) = (0, 0, 0)
I1210 16:18:26.512230 239058944 group.cpp:419] Trying to create path '/mesos'
in ZooKeeper
2018-12-10 16:18:29,859:28813(0x70000f944000):ZOO_WARN@zookeeper_interest@1597:
Exceeded deadline by 13ms
../../src/tests/master_contender_detector_tests.cpp:872: Failure
Failed to wait 15secs for detected
2018-12-10 16:18:38,091:28813(0x1142f65c0):ZOO_INFO@zookeeper_close@2562:
Closing zookeeper sessionId=0x1679aa0ddc90003 to [127.0.0.1:54990]
I1210 16:18:38.091266 338650560 contender.cpp:207] Now cancelling the
membership: 1
2018-12-10 16:18:38,091:28813(0x1142f65c0):ZOO_INFO@zookeeper_close@2562:
Closing zookeeper sessionId=0x1679aa0ddc90005 to [127.0.0.1:54990]
I1210 16:18:38.091816 338650560 contender.cpp:207] Now cancelling the
membership: 0
2018-12-10 16:18:38,092:28813(0x1142f65c0):ZOO_INFO@zookeeper_close@2562:
Closing zookeeper sessionId=0x1679aa0ddc90004 to [127.0.0.1:54990]
I1210 16:18:38.092182 338650560 zookeeper_test_server.cpp:116] Shutting down
ZooKeeperTestServer on port 54990
[ FAILED ]
ZooKeeperMasterContenderDetectorTest.MasterDetectorExpireSlaveZKSessionNewMaster
(25032 ms)
{noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)