I think this was recently fixed. Can you try building from the latest "master"?
On Tue, Jul 2, 2013 at 8:05 PM, 王国栋 <[email protected]> wrote: > I am doing some failover test about mesos nowadays. > > The code I am using is pulled from git master. And in the following case, I > find that slave may crash from time to time. > > Reproduce process > 1. start mesos cluster > 2. start hadoop jobtracker, then jobtracker will register to mesos > 3. submit some hadoop jobs, and keep them running. > 4. kill all the mesos master and slave > 5. restart mesos cluster > > Then, after slave is restarted. Sometimes, some slave will crashes. I got > the log of the slave. Hoping it will help. > > I0702 19:03:32.684700 24900 slave.cpp:2510] Current usage 71.33%. Max > allowed age: 1.306860088778333days > 2013-07-02 19:03:33,174:24890(0x41057940):ZOO_WARN@zookeeper_interest > @1461: > Exceeded deadline by 28ms > 2013-07-02 19:03:33,180:24890(0x41057940):ZOO_DEBUG@zookeeper_process > @1983: > Got ping response in 5 ms > 2013-07-02 19:03:36,565:24890(0x41057940):ZOO_WARN@zookeeper_interest > @1461: > Exceeded deadline by 57ms > 2013-07-02 19:03:36,566:24890(0x41057940):ZOO_DEBUG@zookeeper_process > @1983: > Got ping response in 0 ms > 2013-07-02 19:03:39,906:24890(0x41057940):ZOO_DEBUG@zookeeper_process > @1983: > Got ping response in 6 ms > 2013-07-02 19:03:43,245:24890(0x41057940):ZOO_WARN@zookeeper_interest > @1461: > Exceeded deadline by 12ms > 2013-07-02 19:03:43,292:24890(0x41057940):ZOO_DEBUG@zookeeper_process > @1983: > Got ping response in 46 ms > 2013-07-02 19:03:46,588:24890(0x41057940):ZOO_DEBUG@zookeeper_process > @1983: > Got ping response in 9 ms > 2013-07-02 19:03:49,913:24890(0x41057940):ZOO_DEBUG@zookeeper_process > @1983: > Got ping response in 0 ms > 2013-07-02 19:03:53,277:24890(0x41057940):ZOO_WARN@zookeeper_interest > @1461: > Exceeded deadline by 31ms > 2013-07-02 19:03:53,293:24890(0x41057940):ZOO_DEBUG@zookeeper_process > @1983: > Got ping response in 15 ms > 2013-07-02 19:03:56,611:24890(0x41057940):ZOO_DEBUG@zookeeper_process > @1983: > Got ping response in 0 ms > 2013-07-02 19:03:59,967:24890(0x41057940):ZOO_WARN@zookeeper_interest > @1461: > Exceeded deadline by 22ms > 2013-07-02 19:03:59,968:24890(0x41057940):ZOO_DEBUG@zookeeper_process > @1983: > Got ping response in 0 ms > 2013-07-02 19:04:03,335:24890(0x41057940):ZOO_DEBUG@zookeeper_process > @1983: > Got ping response in 33 ms > 2013-07-02 19:04:06,672:24890(0x41057940):ZOO_WARN@zookeeper_interest > @1461: > Exceeded deadline by 36ms > 2013-07-02 19:04:06,691:24890(0x41057940):ZOO_DEBUG@zookeeper_process > @1983: > Got ping response in 18 ms > 2013-07-02 19:04:10,012:24890(0x41057940):ZOO_DEBUG@zookeeper_process > @1983: > Got ping response in 6 ms > 2013-07-02 19:04:13,344:24890(0x41057940):ZOO_DEBUG@zookeeper_process > @1983: > Got ping response in 3 ms > 2013-07-02 19:04:16,707:24890(0x41057940):ZOO_WARN@zookeeper_interest > @1461: > Exceeded deadline by 32ms > 2013-07-02 19:04:16,737:24890(0x41057940):ZOO_DEBUG@zookeeper_process > @1983: > Got ping response in 30 ms > 2013-07-02 19:04:20,057:24890(0x41057940):ZOO_WARN@zookeeper_interest > @1461: > Exceeded deadline by 16ms > 2013-07-02 19:04:20,067:24890(0x41057940):ZOO_DEBUG@zookeeper_process > @1983: > Got ping response in 10 ms > 2013-07-02 19:04:23,410:24890(0x41057940):ZOO_WARN@zookeeper_interest > @1461: > Exceeded deadline by 19ms > 2013-07-02 19:04:23,411:24890(0x41057940):ZOO_DEBUG@zookeeper_process > @1983: > Got ping response in 1 ms > 2013-07-02 19:04:26,820:24890(0x41057940):ZOO_WARN@zookeeper_interest > @1461: > Exceeded deadline by 77ms > 2013-07-02 19:04:26,919:24890(0x41057940):ZOO_DEBUG@zookeeper_process > @1983: > Got ping response in 98 ms > 2013-07-02 19:04:30,163:24890(0x41057940):ZOO_DEBUG@zookeeper_process > @1983: > Got ping response in 0 ms > I0702 19:04:32.685693 24892 slave.cpp:2510] Current usage 71.33%. Max > allowed age: 1.306755345349155days > 2013-07-02 19:04:33,514:24890(0x41057940):ZOO_DEBUG@zookeeper_process > @1983: > Got ping response in 17 ms > 2013-07-02 19:04:36,832:24890(0x41057940):ZOO_DEBUG@zookeeper_process > @1983: > Got ping response in 1 ms > 2013-07-02 19:04:40,164:24890(0x41057940):ZOO_DEBUG@zookeeper_process > @1983: > Got ping response in 0 ms > 2013-07-02 19:04:43,498:24890(0x41057940):ZOO_DEBUG@zookeeper_process > @1983: > Got ping response in 0 ms > 2013-07-02 19:04:46,878:24890(0x41057940):ZOO_WARN@zookeeper_interest > @1461: > Exceeded deadline by 46ms > 2013-07-02 19:04:46,880:24890(0x41057940):ZOO_DEBUG@zookeeper_process > @1983: > Got ping response in 1 ms > 2013-07-02 19:04:50,282:24890(0x41057940):ZOO_DEBUG@zookeeper_process > @1983: > Got ping response in 71 ms > 2013-07-02 19:04:53,565:24890(0x41057940):ZOO_DEBUG@zookeeper_process > @1983: > Got ping response in 19 ms > Result::get() but state == NONE > *** Aborted at 1372763096 (unix time) try "date -d @1372763096" if you are > using GNU date *** > PC: @ 0x3d87a30215 (unknown) > *** SIGABRT (@0x613a) received by PID 24890 (TID 0x4878f940) from PID > 24890; stack trace: *** > @ 0x3d8860e4c0 (unknown) > @ 0x3d87a30215 (unknown) > @ 0x3d87a31cc0 (unknown) > @ 0x2b02c1bf96e5 mesos::internal::slave::ProcessIsolator::usage() > @ 0x2b02c1b59a30 std::tr1::_Function_handler<>::_M_invoke() > @ 0x2b02c1b5a361 std::tr1::function<>::operator()() > @ 0x2b02c1b63f2b process::internal::pdispatcher<>() > @ 0x2b02c1b5c45e std::tr1::_Function_handler<>::_M_invoke() > @ 0x2b02c1dbf205 process::ProcessManager::resume() > @ 0x2b02c1dbfbbf process::schedule() > @ 0x3d88606367 (unknown) > @ 0x3d87ad30ad (unknown) > > > > > Guodong >
