haosdent created MESOS-5161:
-------------------------------
Summary: GroupTest.GroupCancelWithDisconnect is flaky
Key: MESOS-5161
URL: https://issues.apache.org/jira/browse/MESOS-5161
Project: Mesos
Issue Type: Bug
Reporter: haosdent
An abnormal run looks like:
{code}
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from GroupTest
[ RUN ] GroupTest.GroupCancelWithDisconnect
I0410 12:27:52.546856 16775 zookeeper_test_server.cpp:156] Started
ZooKeeperTestServer on port 39846
2016-04-10 12:27:52,547:16775(0x7fae3d916700):ZOO_INFO@log_env@726: Client
environment:zookeeper.version=zookeeper C client 3.4.8
2016-04-10 12:27:52,547:16775(0x7fae3d916700):ZOO_INFO@log_env@730: Client
environment:host.name=centos7-sandbox
2016-04-10 12:27:52,547:16775(0x7fae3d916700):ZOO_INFO@log_env@737: Client
environment:os.name=Linux
2016-04-10 12:27:52,547:16775(0x7fae3d916700):ZOO_INFO@log_env@738: Client
environment:os.arch=3.10.0-327.10.1.el7.x86_64
2016-04-10 12:27:52,547:16775(0x7fae3d916700):ZOO_INFO@log_env@739: Client
environment:os.version=#1 SMP Tue Feb 16 17:03:50 UTC 2016
2016-04-10 12:27:52,547:16775(0x7fae3d916700):ZOO_INFO@log_env@747: Client
environment:user.name=haosdent
2016-04-10 12:27:52,547:16775(0x7fae3d916700):ZOO_INFO@log_env@755: Client
environment:user.home=/root
2016-04-10 12:27:52,547:16775(0x7fae3d916700):ZOO_INFO@log_env@767: Client
environment:user.dir=/home/haosdent/mesos/build
2016-04-10 12:27:52,547:16775(0x7fae3d916700):ZOO_INFO@zookeeper_init@800:
Initiating client connection, host=127.0.0.1:39846 sessionTimeout=10000
watcher=0x7fae4aeae226 sessionId=0 sessionPasswd=<null> context=0x7fae1c000ec0
flags=0
2016-04-10 12:27:52,547:16775(0x7fadf57f1700):ZOO_INFO@check_events@1732:
initiated connection to server [127.0.0.1:39846]
2016-04-10 12:27:52,578:16775(0x7fadf57f1700):ZOO_INFO@check_events@1779:
session establishment complete on server [127.0.0.1:39846],
sessionId=0x153fe6d0b620000, negotiated timeout=10000
I0410 12:27:52.578307 16792 group.cpp:347] Group process
(group(134)@122.11.130.70:38110) connected to ZooKeeper
I0410 12:27:52.578377 16792 group.cpp:829] Syncing group operations: queue size
(joins, cancels, datas) = (1, 0, 0)
I0410 12:27:52.578397 16792 group.cpp:425] Trying to create path '/test' in
ZooKeeper
I0410 12:27:52.646234 16793 group.cpp:698] Trying to get '/test/0000000000' in
ZooKeeper
I0410 12:27:52.647238 16775 zookeeper_test_server.cpp:116] Shutting down
ZooKeeperTestServer on port 39846
2016-04-10
12:27:52,647:16775(0x7fadf57f1700):ZOO_ERROR@handle_socket_error_msg@1750:
Socket [127.0.0.1:39846] zk retcode=-4, errno=112(Host is down): failed while
receiving a server response
I0410 12:27:52.647831 16794 group.cpp:658] Trying to remove '/test/0000000000'
in ZooKeeper
2016-04-10 12:27:52,648:16775(0x7fadf57f1700):ZOO_INFO@check_events@1732:
initiated connection to server [127.0.0.1:39846]
2016-04-10
12:27:52,648:16775(0x7fadf57f1700):ZOO_ERROR@handle_socket_error_msg@1750:
Socket [127.0.0.1:39846] zk retcode=-4, errno=112(Host is down): failed while
receiving a server response
I0410 12:27:52.648946 16775 zookeeper_test_server.cpp:156] Started
ZooKeeperTestServer on port 39846
I0410 12:27:52.649090 16794 group.cpp:458] Lost connection to ZooKeeper,
attempting to reconnect ...
W0410 12:27:52.660513 16795 group.cpp:501] Timed out waiting to connect to
ZooKeeper. Forcing ZooKeeper session (sessionId=153fe6d0b620000) expiration
I0410 12:27:52.661034 16795 group.cpp:517] ZooKeeper session expired
2016-04-10 12:27:52,661:16775(0x7fae3d115700):ZOO_INFO@zookeeper_close@2547:
Freeing zookeeper resources for sessionId=0x153fe6d0b620000
2016-04-10 12:27:52,661:16775(0x7fae3f119700):ZOO_INFO@log_env@726: Client
environment:zookeeper.version=zookeeper C client 3.4.8
2016-04-10 12:27:52,661:16775(0x7fae3f119700):ZOO_INFO@log_env@730: Client
environment:host.name=centos7-sandbox
2016-04-10 12:27:52,661:16775(0x7fae3f119700):ZOO_INFO@log_env@737: Client
environment:os.name=Linux
2016-04-10 12:27:52,661:16775(0x7fae3f119700):ZOO_INFO@log_env@738: Client
environment:os.arch=3.10.0-327.10.1.el7.x86_64
2016-04-10 12:27:52,661:16775(0x7fae3f119700):ZOO_INFO@log_env@739: Client
environment:os.version=#1 SMP Tue Feb 16 17:03:50 UTC 2016
2016-04-10 12:27:52,661:16775(0x7fae3f119700):ZOO_INFO@log_env@747: Client
environment:user.name=haosdent
2016-04-10 12:27:52,661:16775(0x7fae3f119700):ZOO_INFO@log_env@755: Client
environment:user.home=/root
2016-04-10 12:27:52,661:16775(0x7fae3f119700):ZOO_INFO@log_env@767: Client
environment:user.dir=/home/haosdent/mesos/build
2016-04-10 12:27:52,661:16775(0x7fae3f119700):ZOO_INFO@zookeeper_init@800:
Initiating client connection, host=127.0.0.1:39846 sessionTimeout=10000
watcher=0x7fae4aeae226 sessionId=0 sessionPasswd=<null> context=0x7fae1c000ec0
flags=0
2016-04-10 12:27:52,661:16775(0x7fadf4ff0700):ZOO_INFO@check_events@1732:
initiated connection to server [127.0.0.1:39846]
2016-04-10 12:27:52,694:16775(0x7fadf4ff0700):ZOO_INFO@check_events@1779:
session establishment complete on server [127.0.0.1:39846],
sessionId=0x153fe6d0bc80000, negotiated timeout=10000
I0410 12:27:52.694944 16795 group.cpp:347] Group process
(group(134)@122.11.130.70:38110) connected to ZooKeeper
I0410 12:27:52.694988 16795 group.cpp:829] Syncing group operations: queue size
(joins, cancels, datas) = (0, 1, 0)
I0410 12:27:52.695009 16795 group.cpp:425] Trying to create path '/test' in
ZooKeeper
I0410 12:27:52.695755 16795 group.cpp:658] Trying to remove '/test/0000000000'
in ZooKeeper
F0410 12:27:52.728178 16795 group.cpp:681] Check failed:
owned.count(membership.id()) == 1
*** Check failure stack trace: ***
@ 0x7fae4b5a1c92 google::LogMessage::Fail()
@ 0x7fae4b5a1bf1 google::LogMessage::SendToLog()
@ 0x7fae4b5a1602 google::LogMessage::Flush()
@ 0x7fae4b5a4336 google::LogMessageFatal::~LogMessageFatal()
@ 0x7fae4aec53d5 zookeeper::GroupProcess::doCancel()
@ 0x7fae4aec750f zookeeper::GroupProcess::sync()
@ 0x7fae4aec2c8c zookeeper::GroupProcess::connected()
@ 0x7fae4aeeb681
_ZZN7process8dispatchIN9zookeeper12GroupProcessElblbEEvRKNS_3PIDIT_EEMS4_FvT0_T1_ET2_T3_ENKUlPNS_11ProcessBaseEE_clESF_
@ 0x7fae4aeebc1b
_ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN9zookeeper12GroupProcessElblbEEvRKNS0_3PIDIT_EEMS8_FvT0_T1_ET2_T3_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
@ 0x7fae4b51d439 std::function<>::operator()()
@ 0x7fae4b503019 process::ProcessBase::visit()
@ 0x7fae4b508d50 process::DispatchEvent::visit()
@ 0x9bf57e process::ProcessBase::serve()
@ 0x7fae4b4ff34f process::ProcessManager::resume()
@ 0x7fae4b4fbfde
_ZZN7process14ProcessManager12init_threadsEvENKUlRKSt11atomic_boolE_clES3_
@ 0x7fae4b50853c
_ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEE6__callIvIEILm0EEEET_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
@ 0x7fae4b5084ec
_ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEEclIIEvEET0_DpOT_
@ 0x7fae4b50847e
_ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE
@ 0x7fae4b5083d5
_ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEEclEv
@ 0x7fae4b50836e
_ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
@ 0x7fae462b2220 (unknown)
@ 0x7fae46918dc5 start_thread
@ 0x7fae45a1a28d __clone
[haosdent@centos7-sandbox build]$
{code}
And a normal run:
{code}
[ RUN ] GroupTest.GroupCancelWithDisconnect
I0410 12:27:52.347753 16775 zookeeper_test_server.cpp:156] Started
ZooKeeperTestServer on port 48812
2016-04-10 12:27:52,348:16775(0x7fae3d115700):ZOO_INFO@log_env@726: Client
environment:zookeeper.version=zookeeper C client 3.4.8
2016-04-10 12:27:52,348:16775(0x7fae3d115700):ZOO_INFO@log_env@730: Client
environment:host.name=centos7-sandbox
2016-04-10 12:27:52,348:16775(0x7fae3d115700):ZOO_INFO@log_env@737: Client
environment:os.name=Linux
2016-04-10 12:27:52,348:16775(0x7fae3d115700):ZOO_INFO@log_env@738: Client
environment:os.arch=3.10.0-327.10.1.el7.x86_64
2016-04-10 12:27:52,348:16775(0x7fae3d115700):ZOO_INFO@log_env@739: Client
environment:os.version=#1 SMP Tue Feb 16 17:03:50 UTC 2016
2016-04-10 12:27:52,348:16775(0x7fae3d115700):ZOO_INFO@log_env@747: Client
environment:user.name=haosdent
2016-04-10 12:27:52,348:16775(0x7fae3d115700):ZOO_INFO@log_env@755: Client
environment:user.home=/root
2016-04-10 12:27:52,348:16775(0x7fae3d115700):ZOO_INFO@log_env@767: Client
environment:user.dir=/home/haosdent/mesos/build
2016-04-10 12:27:52,348:16775(0x7fae3d115700):ZOO_INFO@zookeeper_init@800:
Initiating client connection, host=127.0.0.1:48812 sessionTimeout=10000
watcher=0x7fae4aeae226 sessionId=0 sessionPasswd=<null> context=0x7fae24001250
flags=0
2016-04-10 12:27:52,348:16775(0x7fadf4ff0700):ZOO_INFO@check_events@1732:
initiated connection to server [127.0.0.1:48812]
2016-04-10 12:27:52,377:16775(0x7fadf4ff0700):ZOO_INFO@check_events@1779:
session establishment complete on server [127.0.0.1:48812],
sessionId=0x153fe6d0a9b0000, negotiated timeout=10000
I0410 12:27:52.378319 16789 group.cpp:347] Group process
(group(133)@122.11.130.70:38110) connected to ZooKeeper
I0410 12:27:52.378376 16789 group.cpp:829] Syncing group operations: queue size
(joins, cancels, datas) = (1, 0, 0)
I0410 12:27:52.378401 16789 group.cpp:425] Trying to create path '/test' in
ZooKeeper
I0410 12:27:52.446169 16791 group.cpp:698] Trying to get '/test/0000000000' in
ZooKeeper
I0410 12:27:52.447144 16775 zookeeper_test_server.cpp:116] Shutting down
ZooKeeperTestServer on port 48812
2016-04-10
12:27:52,447:16775(0x7fadf4ff0700):ZOO_ERROR@handle_socket_error_msg@1750:
Socket [127.0.0.1:48812] zk retcode=-4, errno=112(Host is down): failed while
receiving a server response
I0410 12:27:52.447749 16789 group.cpp:458] Lost connection to ZooKeeper,
attempting to reconnect ...
I0410 12:27:52.448313 16789 group.cpp:658] Trying to remove '/test/0000000000'
in ZooKeeper
2016-04-10 12:27:52,448:16775(0x7fadf4ff0700):ZOO_INFO@check_events@1732:
initiated connection to server [127.0.0.1:48812]
I0410 12:27:52.448839 16775 zookeeper_test_server.cpp:156] Started
ZooKeeperTestServer on port 48812
2016-04-10 12:27:52,448:16775(0x7fadf4ff0700):ZOO_INFO@check_events@1779:
session establishment complete on server [127.0.0.1:48812],
sessionId=0x153fe6d0a9b0000, negotiated timeout=10000
I0410 12:27:52.478387 16789 group.cpp:347] Group process
(group(133)@122.11.130.70:38110) reconnected to ZooKeeper
I0410 12:27:52.478449 16789 group.cpp:829] Syncing group operations: queue size
(joins, cancels, datas) = (0, 0, 0)
2016-04-10 12:27:52,499:16775(0x7fae4011b700):ZOO_INFO@zookeeper_close@2530:
Closing zookeeper sessionId=0x153fe6d0a9b0000 to [127.0.0.1:48812]
I0410 12:27:52.500155 16775 zookeeper_test_server.cpp:116] Shutting down
ZooKeeperTestServer on port 48812
{code}
It is because when cancel operation happens after shutdown zookeeper, the
{{owned.clear()}} would be called in {{GroupProcess::expired}}. And after
zookeeper restart successfully, {{GroupProcess::sync}} start to consume
{{pending.cancels}} by {{GroupProcess::doCancel}}. Then the check in
{{CHECK(owned.count(membership.id()) == 1);}} could not pass.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)