> On June 12, 2014, 6:09 p.m., Ben Mahler wrote: > > I think the subject is a bit off, should say "Reregister", not "Register", > > right? > > > > Did you run this with repetition to see if it is flaky still? > > > > $ ./bin/mesos-tests.sh > > --gtest_filter="SlaveTest.TerminatingSlaveDoesNotReregister" > > --gtest_repeat=-1 --gtest_break_on_failure --verbose
Thanks for the cool advice. I run $ ./bin/mesos-tests.sh --gtest_filter="SlaveTest.TerminatingSlaveDoesNotReregister" --gtest_repeat=-1 --gtest_break_on_failure --verbose And in the 13454th iteration, it gets a new error, looks like the master failed to start. Repeating all tests (iteration 13454) . . . Note: Google Test filter = SlaveTest.TerminatingSlaveDoesNotReregister-CpuIsolatorTest/1.UserCpuUsage:CpuIsolatorTest/1.SystemCpuUsage:LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs:LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs_Big_Quota:MemIsolatorTest/0.MemUsage:MemIsolatorTest/1.MemUsage:SlaveTest.ROOT_RunTaskWithCommandInfoWithoutUser:SlaveTest.DISABLED_ROOT_RunTaskWithCommandInfoWithUser:ContainerizerTest.ROOT_CGROUPS_BalloonFramework:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Enabled:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Subsystems:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Mounted:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Get:CgroupsAnyHierarchyTest.ROOT_CGROUPS_NestedCgroups:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Tasks:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Read:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Write:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Cfs_Big_Quota:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Busy:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_SubsystemsHierarchy:CgroupsAnyHierarchyWithCpuMemoryT est.ROOT_CGROUPS_MountedSubsystems:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_CreateRemove:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Listen:CgroupsNoHierarchyTest.ROOT_CGROUPS_NOHIERARCHY_MountUnmountHierarchy:CgroupsAnyHierarchyWithCpuAcctMemoryTest.ROOT_CGROUPS_Stat:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_Freeze:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_Kill:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_Destroy:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_AssignThreads:SlaveCount/Registrar_BENCHMARK_Test.performance/0:SlaveCount/Registrar_BENCHMARK_Test.performance/1:SlaveCount/Registrar_BENCHMARK_Test.performance/2:SlaveCount/Registrar_BENCHMARK_Test.performance/3: [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from SlaveTest [ RUN ] SlaveTest.TerminatingSlaveDoesNotReregister Using temporary directory '/tmp/SlaveTest_TerminatingSlaveDoesNotReregister_O9kh4V' I0612 19:03:17.706805 2910 leveldb.cpp:176] Opened db in 15.704031ms I0612 19:03:17.712888 2910 leveldb.cpp:183] Compacted db in 6.057101ms I0612 19:03:17.712910 2910 leveldb.cpp:198] Created db iterator in 2075ns I0612 19:03:17.712920 2910 leveldb.cpp:204] Seeked to beginning of db in 365ns I0612 19:03:17.712929 2910 leveldb.cpp:273] Iterated through 0 keys in the db in 96ns I0612 19:03:17.712939 2910 replica.cpp:741] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0612 19:03:17.713034 2933 recover.cpp:425] Starting replica recovery I0612 19:03:17.713165 2925 recover.cpp:451] Replica is in EMPTY status I0612 19:03:17.713366 2925 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request I0612 19:03:17.713471 2924 master.cpp:280] Master 20140612-190317-3823062160-44846-2910 (chimney.mesosphere.io) started on 144.76.223.227:44846 I0612 19:03:17.713497 2924 master.cpp:317] Master only allowing authenticated frameworks to register I0612 19:03:17.713507 2924 master.cpp:322] Master only allowing authenticated slaves to register I0612 19:03:17.713515 2924 credentials.hpp:35] Loading credentials for authentication from '/tmp/SlaveTest_TerminatingSlaveDoesNotReregister_O9kh4V/credentials' I0612 19:03:17.713517 2933 recover.cpp:188] Received a recover response from a replica in EMPTY status I0612 19:03:17.713564 2924 master.cpp:348] Authorization enabled I0612 19:03:17.713625 2928 recover.cpp:542] Updating replica status to STARTING I0612 19:03:17.713819 2933 master.cpp:961] The newly elected leader is [email protected]:44846 with id 20140612-190317-3823062160-44846-2910 I0612 19:03:17.719408 2934 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 5.73482ms I0612 19:03:32.107343 2933 master.cpp:974] Elected as the leading master! I0612 19:03:32.107364 2934 replica.cpp:320] Persisted replica status to STARTING F0612 19:03:27.714102 2910 cluster.hpp:427] Failed to wait for _recover *** Check failure stack trace: *** I0612 19:03:32.107374 2933 master.cpp:792] Recovering from registrar I0612 19:03:32.107522 2934 recover.cpp:451] Replica is in STARTING status I0612 19:03:32.107746 2929 registrar.cpp:313] Recovering registrar I0612 19:03:32.108326 2925 replica.cpp:638] Replica in STARTING status received a broadcasted recover request I0612 19:03:32.108497 2931 recover.cpp:188] Received a recover response from a replica in STARTING status I0612 19:03:32.108778 2929 recover.cpp:542] Updating replica status to VOTING @ 0x7f4c0cc3dc3d google::LogMessage::Fail() @ 0x7f4c0cc3fa7d google::LogMessage::SendToLog() @ 0x7f4c0cc3d82c google::LogMessage::Flush() @ 0x7f4c0cc40379 google::LogMessageFatal::~LogMessageFatal() @ 0x73b9db mesos::internal::tests::Cluster::Masters::start() @ 0x736885 mesos::internal::tests::MesosTest::StartMaster() @ 0x826fbf SlaveTest_TerminatingSlaveDoesNotReregister_Test::TestBody() @ 0x8cfbb3 testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x8c8e87 testing::Test::Run() @ 0x8c8f2e testing::TestInfo::Run() @ 0x8c9035 testing::TestCase::Run() @ 0x8c92d8 testing::internal::UnitTestImpl::RunAllTests() I0612 19:03:32.117660 2932 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 8.736907ms I0612 19:03:32.117678 2932 replica.cpp:320] Persisted replica status to VOTING I0612 19:03:32.117710 2931 recover.cpp:556] Successfully joined the Paxos group @ 0x8c9577 testing::UnitTest::Run() I0612 19:03:32.117769 2931 recover.cpp:440] Recover process terminated @ 0x48b01d main I0612 19:03:32.117884 2928 log.cpp:656] Attempting to start the writer @ 0x7f4c0af73de5 (unknown) I0612 19:03:32.118140 2929 replica.cpp:474] Replica received implicit promise request with proposal 1 @ 0x498144 (unknown) Aborted - Yifan ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22472/#review45516 ----------------------------------------------------------- On June 12, 2014, 7:15 p.m., Yifan Gu wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/22472/ > ----------------------------------------------------------- > > (Updated June 12, 2014, 7:15 p.m.) > > > Review request for mesos, Ben Mahler, Dominic Hamon, and Vinod Kone. > > > Bugs: MESOS-1460 > https://issues.apache.org/jira/browse/MESOS-1460 > > > Repository: mesos-git > > > Description > ------- > > Ignored subsequent status updates. > Muted warnings by catching mock calls. > > > Diffs > ----- > > src/tests/slave_tests.cpp 2c8f183 > > Diff: https://reviews.apache.org/r/22472/diff/ > > > Testing > ------- > > make check > > > Thanks, > > Yifan Gu > >
