> On June 12, 2014, 6:09 p.m., Ben Mahler wrote: > > I think the subject is a bit off, should say "Reregister", not "Register", > > right? > > > > Did you run this with repetition to see if it is flaky still? > > > > $ ./bin/mesos-tests.sh > > --gtest_filter="SlaveTest.TerminatingSlaveDoesNotReregister" > > --gtest_repeat=-1 --gtest_break_on_failure --verbose > > Yifan Gu wrote: > Thanks for the cool advice. I run > $ ./bin/mesos-tests.sh > --gtest_filter="SlaveTest.TerminatingSlaveDoesNotReregister" > --gtest_repeat=-1 --gtest_break_on_failure --verbose > > And in the 13454th iteration, it gets a new error, looks like the master > failed to start. > > > Repeating all tests (iteration 13454) . . . > > Note: Google Test filter = > SlaveTest.TerminatingSlaveDoesNotReregister-CpuIsolatorTest/1.UserCpuUsage:CpuIsolatorTest/1.SystemCpuUsage:LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs:LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs_Big_Quota:MemIsolatorTest/0.MemUsage:MemIsolatorTest/1.MemUsage:SlaveTest.ROOT_RunTaskWithCommandInfoWithoutUser:SlaveTest.DISABLED_ROOT_RunTaskWithCommandInfoWithUser:ContainerizerTest.ROOT_CGROUPS_BalloonFramework:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Enabled:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Subsystems:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Mounted:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Get:CgroupsAnyHierarchyTest.ROOT_CGROUPS_NestedCgroups:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Tasks:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Read:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Write:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Cfs_Big_Quota:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Busy:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_SubsystemsHierarchy:CgroupsAnyHierarchyWithCpuM emoryTest.ROOT_CGROUPS_MountedSubsystems:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_CreateRemove:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Listen:CgroupsNoHierarchyTest.ROOT_CGROUPS_NOHIERARCHY_MountUnmountHierarchy:CgroupsAnyHierarchyWithCpuAcctMemoryTest.ROOT_CGROUPS_Stat:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_Freeze:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_Kill:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_Destroy:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_AssignThreads:SlaveCount/Registrar_BENCHMARK_Test.performance/0:SlaveCount/Registrar_BENCHMARK_Test.performance/1:SlaveCount/Registrar_BENCHMARK_Test.performance/2:SlaveCount/Registrar_BENCHMARK_Test.performance/3: > [==========] Running 1 test from 1 test case. > [----------] Global test environment set-up. > [----------] 1 test from SlaveTest > [ RUN ] SlaveTest.TerminatingSlaveDoesNotReregister > Using temporary directory > '/tmp/SlaveTest_TerminatingSlaveDoesNotReregister_O9kh4V' > I0612 19:03:17.706805 2910 leveldb.cpp:176] Opened db in 15.704031ms > I0612 19:03:17.712888 2910 leveldb.cpp:183] Compacted db in 6.057101ms > I0612 19:03:17.712910 2910 leveldb.cpp:198] Created db iterator in 2075ns > I0612 19:03:17.712920 2910 leveldb.cpp:204] Seeked to beginning of db in > 365ns > I0612 19:03:17.712929 2910 leveldb.cpp:273] Iterated through 0 keys in > the db in 96ns > I0612 19:03:17.712939 2910 replica.cpp:741] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I0612 19:03:17.713034 2933 recover.cpp:425] Starting replica recovery > I0612 19:03:17.713165 2925 recover.cpp:451] Replica is in EMPTY status > I0612 19:03:17.713366 2925 replica.cpp:638] Replica in EMPTY status > received a broadcasted recover request > I0612 19:03:17.713471 2924 master.cpp:280] Master > 20140612-190317-3823062160-44846-2910 (chimney.mesosphere.io) started on > 144.76.223.227:44846 > I0612 19:03:17.713497 2924 master.cpp:317] Master only allowing > authenticated frameworks to register > I0612 19:03:17.713507 2924 master.cpp:322] Master only allowing > authenticated slaves to register > I0612 19:03:17.713515 2924 credentials.hpp:35] Loading credentials for > authentication from > '/tmp/SlaveTest_TerminatingSlaveDoesNotReregister_O9kh4V/credentials' > I0612 19:03:17.713517 2933 recover.cpp:188] Received a recover response > from a replica in EMPTY status > I0612 19:03:17.713564 2924 master.cpp:348] Authorization enabled > I0612 19:03:17.713625 2928 recover.cpp:542] Updating replica status to > STARTING > I0612 19:03:17.713819 2933 master.cpp:961] The newly elected leader is > [email protected]:44846 with id 20140612-190317-3823062160-44846-2910 > I0612 19:03:17.719408 2934 leveldb.cpp:306] Persisting metadata (8 > bytes) to leveldb took 5.73482ms > I0612 19:03:32.107343 2933 master.cpp:974] Elected as the leading master! > I0612 19:03:32.107364 2934 replica.cpp:320] Persisted replica status to > STARTING > F0612 19:03:27.714102 2910 cluster.hpp:427] Failed to wait for _recover > *** Check failure stack trace: *** > I0612 19:03:32.107374 2933 master.cpp:792] Recovering from registrar > I0612 19:03:32.107522 2934 recover.cpp:451] Replica is in STARTING status > I0612 19:03:32.107746 2929 registrar.cpp:313] Recovering registrar > I0612 19:03:32.108326 2925 replica.cpp:638] Replica in STARTING status > received a broadcasted recover request > I0612 19:03:32.108497 2931 recover.cpp:188] Received a recover response > from a replica in STARTING status > I0612 19:03:32.108778 2929 recover.cpp:542] Updating replica status to > VOTING > @ 0x7f4c0cc3dc3d google::LogMessage::Fail() > @ 0x7f4c0cc3fa7d google::LogMessage::SendToLog() > @ 0x7f4c0cc3d82c google::LogMessage::Flush() > @ 0x7f4c0cc40379 google::LogMessageFatal::~LogMessageFatal() > @ 0x73b9db > mesos::internal::tests::Cluster::Masters::start() > @ 0x736885 mesos::internal::tests::MesosTest::StartMaster() > @ 0x826fbf > SlaveTest_TerminatingSlaveDoesNotReregister_Test::TestBody() > @ 0x8cfbb3 > testing::internal::HandleExceptionsInMethodIfSupported<>() > @ 0x8c8e87 testing::Test::Run() > @ 0x8c8f2e testing::TestInfo::Run() > @ 0x8c9035 testing::TestCase::Run() > @ 0x8c92d8 testing::internal::UnitTestImpl::RunAllTests() > I0612 19:03:32.117660 2932 leveldb.cpp:306] Persisting metadata (8 > bytes) to leveldb took 8.736907ms > I0612 19:03:32.117678 2932 replica.cpp:320] Persisted replica status to > VOTING > I0612 19:03:32.117710 2931 recover.cpp:556] Successfully joined the > Paxos group > @ 0x8c9577 testing::UnitTest::Run() > I0612 19:03:32.117769 2931 recover.cpp:440] Recover process terminated > @ 0x48b01d main > I0612 19:03:32.117884 2928 log.cpp:656] Attempting to start the writer > @ 0x7f4c0af73de5 (unknown) > I0612 19:03:32.118140 2929 replica.cpp:474] Replica received implicit > promise request with proposal 1 > @ 0x498144 (unknown) > Aborted > > > > > Ben Mahler wrote: > Thanks Yifan, that looks like an orthogonal issue (strange that the > master took more than 10 seconds to realize it was elected). > > Will get this committed for you.
Thanks Ben! - Yifan ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22472/#review45516 ----------------------------------------------------------- On June 12, 2014, 7:15 p.m., Yifan Gu wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/22472/ > ----------------------------------------------------------- > > (Updated June 12, 2014, 7:15 p.m.) > > > Review request for mesos, Ben Mahler, Dominic Hamon, and Vinod Kone. > > > Bugs: MESOS-1460 > https://issues.apache.org/jira/browse/MESOS-1460 > > > Repository: mesos-git > > > Description > ------- > > Ignored subsequent status updates. > Muted warnings by catching mock calls. > > > Diffs > ----- > > src/tests/slave_tests.cpp 2c8f183 > > Diff: https://reviews.apache.org/r/22472/diff/ > > > Testing > ------- > > make check > > > Thanks, > > Yifan Gu > >
