haosdent created MESOS-5161:
-------------------------------

             Summary: GroupTest.GroupCancelWithDisconnect is flaky
                 Key: MESOS-5161
                 URL: https://issues.apache.org/jira/browse/MESOS-5161
             Project: Mesos
          Issue Type: Bug
            Reporter: haosdent


An abnormal run looks like:
{code}
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from GroupTest
[ RUN      ] GroupTest.GroupCancelWithDisconnect
I0410 12:27:52.546856 16775 zookeeper_test_server.cpp:156] Started 
ZooKeeperTestServer on port 39846
2016-04-10 12:27:52,547:16775(0x7fae3d916700):ZOO_INFO@log_env@726: Client 
environment:zookeeper.version=zookeeper C client 3.4.8
2016-04-10 12:27:52,547:16775(0x7fae3d916700):ZOO_INFO@log_env@730: Client 
environment:host.name=centos7-sandbox
2016-04-10 12:27:52,547:16775(0x7fae3d916700):ZOO_INFO@log_env@737: Client 
environment:os.name=Linux
2016-04-10 12:27:52,547:16775(0x7fae3d916700):ZOO_INFO@log_env@738: Client 
environment:os.arch=3.10.0-327.10.1.el7.x86_64
2016-04-10 12:27:52,547:16775(0x7fae3d916700):ZOO_INFO@log_env@739: Client 
environment:os.version=#1 SMP Tue Feb 16 17:03:50 UTC 2016
2016-04-10 12:27:52,547:16775(0x7fae3d916700):ZOO_INFO@log_env@747: Client 
environment:user.name=haosdent
2016-04-10 12:27:52,547:16775(0x7fae3d916700):ZOO_INFO@log_env@755: Client 
environment:user.home=/root
2016-04-10 12:27:52,547:16775(0x7fae3d916700):ZOO_INFO@log_env@767: Client 
environment:user.dir=/home/haosdent/mesos/build
2016-04-10 12:27:52,547:16775(0x7fae3d916700):ZOO_INFO@zookeeper_init@800: 
Initiating client connection, host=127.0.0.1:39846 sessionTimeout=10000 
watcher=0x7fae4aeae226 sessionId=0 sessionPasswd=<null> context=0x7fae1c000ec0 
flags=0
2016-04-10 12:27:52,547:16775(0x7fadf57f1700):ZOO_INFO@check_events@1732: 
initiated connection to server [127.0.0.1:39846]
2016-04-10 12:27:52,578:16775(0x7fadf57f1700):ZOO_INFO@check_events@1779: 
session establishment complete on server [127.0.0.1:39846], 
sessionId=0x153fe6d0b620000, negotiated timeout=10000
I0410 12:27:52.578307 16792 group.cpp:347] Group process 
(group(134)@122.11.130.70:38110) connected to ZooKeeper
I0410 12:27:52.578377 16792 group.cpp:829] Syncing group operations: queue size 
(joins, cancels, datas) = (1, 0, 0)
I0410 12:27:52.578397 16792 group.cpp:425] Trying to create path '/test' in 
ZooKeeper
I0410 12:27:52.646234 16793 group.cpp:698] Trying to get '/test/0000000000' in 
ZooKeeper
I0410 12:27:52.647238 16775 zookeeper_test_server.cpp:116] Shutting down 
ZooKeeperTestServer on port 39846
2016-04-10 
12:27:52,647:16775(0x7fadf57f1700):ZOO_ERROR@handle_socket_error_msg@1750: 
Socket [127.0.0.1:39846] zk retcode=-4, errno=112(Host is down): failed while 
receiving a server response
I0410 12:27:52.647831 16794 group.cpp:658] Trying to remove '/test/0000000000' 
in ZooKeeper
2016-04-10 12:27:52,648:16775(0x7fadf57f1700):ZOO_INFO@check_events@1732: 
initiated connection to server [127.0.0.1:39846]
2016-04-10 
12:27:52,648:16775(0x7fadf57f1700):ZOO_ERROR@handle_socket_error_msg@1750: 
Socket [127.0.0.1:39846] zk retcode=-4, errno=112(Host is down): failed while 
receiving a server response
I0410 12:27:52.648946 16775 zookeeper_test_server.cpp:156] Started 
ZooKeeperTestServer on port 39846
I0410 12:27:52.649090 16794 group.cpp:458] Lost connection to ZooKeeper, 
attempting to reconnect ...
W0410 12:27:52.660513 16795 group.cpp:501] Timed out waiting to connect to 
ZooKeeper. Forcing ZooKeeper session (sessionId=153fe6d0b620000) expiration
I0410 12:27:52.661034 16795 group.cpp:517] ZooKeeper session expired
2016-04-10 12:27:52,661:16775(0x7fae3d115700):ZOO_INFO@zookeeper_close@2547: 
Freeing zookeeper resources for sessionId=0x153fe6d0b620000

2016-04-10 12:27:52,661:16775(0x7fae3f119700):ZOO_INFO@log_env@726: Client 
environment:zookeeper.version=zookeeper C client 3.4.8
2016-04-10 12:27:52,661:16775(0x7fae3f119700):ZOO_INFO@log_env@730: Client 
environment:host.name=centos7-sandbox
2016-04-10 12:27:52,661:16775(0x7fae3f119700):ZOO_INFO@log_env@737: Client 
environment:os.name=Linux
2016-04-10 12:27:52,661:16775(0x7fae3f119700):ZOO_INFO@log_env@738: Client 
environment:os.arch=3.10.0-327.10.1.el7.x86_64
2016-04-10 12:27:52,661:16775(0x7fae3f119700):ZOO_INFO@log_env@739: Client 
environment:os.version=#1 SMP Tue Feb 16 17:03:50 UTC 2016
2016-04-10 12:27:52,661:16775(0x7fae3f119700):ZOO_INFO@log_env@747: Client 
environment:user.name=haosdent
2016-04-10 12:27:52,661:16775(0x7fae3f119700):ZOO_INFO@log_env@755: Client 
environment:user.home=/root
2016-04-10 12:27:52,661:16775(0x7fae3f119700):ZOO_INFO@log_env@767: Client 
environment:user.dir=/home/haosdent/mesos/build
2016-04-10 12:27:52,661:16775(0x7fae3f119700):ZOO_INFO@zookeeper_init@800: 
Initiating client connection, host=127.0.0.1:39846 sessionTimeout=10000 
watcher=0x7fae4aeae226 sessionId=0 sessionPasswd=<null> context=0x7fae1c000ec0 
flags=0
2016-04-10 12:27:52,661:16775(0x7fadf4ff0700):ZOO_INFO@check_events@1732: 
initiated connection to server [127.0.0.1:39846]
2016-04-10 12:27:52,694:16775(0x7fadf4ff0700):ZOO_INFO@check_events@1779: 
session establishment complete on server [127.0.0.1:39846], 
sessionId=0x153fe6d0bc80000, negotiated timeout=10000
I0410 12:27:52.694944 16795 group.cpp:347] Group process 
(group(134)@122.11.130.70:38110) connected to ZooKeeper
I0410 12:27:52.694988 16795 group.cpp:829] Syncing group operations: queue size 
(joins, cancels, datas) = (0, 1, 0)
I0410 12:27:52.695009 16795 group.cpp:425] Trying to create path '/test' in 
ZooKeeper
I0410 12:27:52.695755 16795 group.cpp:658] Trying to remove '/test/0000000000' 
in ZooKeeper
F0410 12:27:52.728178 16795 group.cpp:681] Check failed: 
owned.count(membership.id()) == 1
*** Check failure stack trace: ***
    @     0x7fae4b5a1c92  google::LogMessage::Fail()
    @     0x7fae4b5a1bf1  google::LogMessage::SendToLog()
    @     0x7fae4b5a1602  google::LogMessage::Flush()
    @     0x7fae4b5a4336  google::LogMessageFatal::~LogMessageFatal()
    @     0x7fae4aec53d5  zookeeper::GroupProcess::doCancel()
    @     0x7fae4aec750f  zookeeper::GroupProcess::sync()
    @     0x7fae4aec2c8c  zookeeper::GroupProcess::connected()
    @     0x7fae4aeeb681  
_ZZN7process8dispatchIN9zookeeper12GroupProcessElblbEEvRKNS_3PIDIT_EEMS4_FvT0_T1_ET2_T3_ENKUlPNS_11ProcessBaseEE_clESF_
    @     0x7fae4aeebc1b  
_ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN9zookeeper12GroupProcessElblbEEvRKNS0_3PIDIT_EEMS8_FvT0_T1_ET2_T3_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
    @     0x7fae4b51d439  std::function<>::operator()()
    @     0x7fae4b503019  process::ProcessBase::visit()
    @     0x7fae4b508d50  process::DispatchEvent::visit()
    @           0x9bf57e  process::ProcessBase::serve()
    @     0x7fae4b4ff34f  process::ProcessManager::resume()
    @     0x7fae4b4fbfde  
_ZZN7process14ProcessManager12init_threadsEvENKUlRKSt11atomic_boolE_clES3_
    @     0x7fae4b50853c  
_ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEE6__callIvIEILm0EEEET_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
    @     0x7fae4b5084ec  
_ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEEclIIEvEET0_DpOT_
    @     0x7fae4b50847e  
_ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE
    @     0x7fae4b5083d5  
_ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEEclEv
    @     0x7fae4b50836e  
_ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
    @     0x7fae462b2220  (unknown)
    @     0x7fae46918dc5  start_thread
    @     0x7fae45a1a28d  __clone
[haosdent@centos7-sandbox build]$
{code}

And a normal run:
{code}
[ RUN      ] GroupTest.GroupCancelWithDisconnect
I0410 12:27:52.347753 16775 zookeeper_test_server.cpp:156] Started 
ZooKeeperTestServer on port 48812
2016-04-10 12:27:52,348:16775(0x7fae3d115700):ZOO_INFO@log_env@726: Client 
environment:zookeeper.version=zookeeper C client 3.4.8
2016-04-10 12:27:52,348:16775(0x7fae3d115700):ZOO_INFO@log_env@730: Client 
environment:host.name=centos7-sandbox
2016-04-10 12:27:52,348:16775(0x7fae3d115700):ZOO_INFO@log_env@737: Client 
environment:os.name=Linux
2016-04-10 12:27:52,348:16775(0x7fae3d115700):ZOO_INFO@log_env@738: Client 
environment:os.arch=3.10.0-327.10.1.el7.x86_64
2016-04-10 12:27:52,348:16775(0x7fae3d115700):ZOO_INFO@log_env@739: Client 
environment:os.version=#1 SMP Tue Feb 16 17:03:50 UTC 2016
2016-04-10 12:27:52,348:16775(0x7fae3d115700):ZOO_INFO@log_env@747: Client 
environment:user.name=haosdent
2016-04-10 12:27:52,348:16775(0x7fae3d115700):ZOO_INFO@log_env@755: Client 
environment:user.home=/root
2016-04-10 12:27:52,348:16775(0x7fae3d115700):ZOO_INFO@log_env@767: Client 
environment:user.dir=/home/haosdent/mesos/build
2016-04-10 12:27:52,348:16775(0x7fae3d115700):ZOO_INFO@zookeeper_init@800: 
Initiating client connection, host=127.0.0.1:48812 sessionTimeout=10000 
watcher=0x7fae4aeae226 sessionId=0 sessionPasswd=<null> context=0x7fae24001250 
flags=0
2016-04-10 12:27:52,348:16775(0x7fadf4ff0700):ZOO_INFO@check_events@1732: 
initiated connection to server [127.0.0.1:48812]
2016-04-10 12:27:52,377:16775(0x7fadf4ff0700):ZOO_INFO@check_events@1779: 
session establishment complete on server [127.0.0.1:48812], 
sessionId=0x153fe6d0a9b0000, negotiated timeout=10000
I0410 12:27:52.378319 16789 group.cpp:347] Group process 
(group(133)@122.11.130.70:38110) connected to ZooKeeper
I0410 12:27:52.378376 16789 group.cpp:829] Syncing group operations: queue size 
(joins, cancels, datas) = (1, 0, 0)
I0410 12:27:52.378401 16789 group.cpp:425] Trying to create path '/test' in 
ZooKeeper
I0410 12:27:52.446169 16791 group.cpp:698] Trying to get '/test/0000000000' in 
ZooKeeper
I0410 12:27:52.447144 16775 zookeeper_test_server.cpp:116] Shutting down 
ZooKeeperTestServer on port 48812
2016-04-10 
12:27:52,447:16775(0x7fadf4ff0700):ZOO_ERROR@handle_socket_error_msg@1750: 
Socket [127.0.0.1:48812] zk retcode=-4, errno=112(Host is down): failed while 
receiving a server response
I0410 12:27:52.447749 16789 group.cpp:458] Lost connection to ZooKeeper, 
attempting to reconnect ...
I0410 12:27:52.448313 16789 group.cpp:658] Trying to remove '/test/0000000000' 
in ZooKeeper
2016-04-10 12:27:52,448:16775(0x7fadf4ff0700):ZOO_INFO@check_events@1732: 
initiated connection to server [127.0.0.1:48812]
I0410 12:27:52.448839 16775 zookeeper_test_server.cpp:156] Started 
ZooKeeperTestServer on port 48812
2016-04-10 12:27:52,448:16775(0x7fadf4ff0700):ZOO_INFO@check_events@1779: 
session establishment complete on server [127.0.0.1:48812], 
sessionId=0x153fe6d0a9b0000, negotiated timeout=10000
I0410 12:27:52.478387 16789 group.cpp:347] Group process 
(group(133)@122.11.130.70:38110) reconnected to ZooKeeper
I0410 12:27:52.478449 16789 group.cpp:829] Syncing group operations: queue size 
(joins, cancels, datas) = (0, 0, 0)
2016-04-10 12:27:52,499:16775(0x7fae4011b700):ZOO_INFO@zookeeper_close@2530: 
Closing zookeeper sessionId=0x153fe6d0a9b0000 to [127.0.0.1:48812]

I0410 12:27:52.500155 16775 zookeeper_test_server.cpp:116] Shutting down 
ZooKeeperTestServer on port 48812
{code}

It is because when cancel operation happens after shutdown zookeeper, the 
{{owned.clear()}} would be called in {{GroupProcess::expired}}. And after 
zookeeper restart successfully, {{GroupProcess::sync}} start to consume 
{{pending.cancels}} by {{GroupProcess::doCancel}}. Then the check in 
{{CHECK(owned.count(membership.id()) == 1);}} could not pass.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to