OK, Thanks Vinod. I will try it. Guodong
On Wed, Jul 3, 2013 at 12:31 PM, Vinod Kone <[email protected]> wrote: > I think this was recently fixed. Can you try building from the latest > "master"? > > > On Tue, Jul 2, 2013 at 8:05 PM, 王国栋 <[email protected]> wrote: > > > I am doing some failover test about mesos nowadays. > > > > The code I am using is pulled from git master. And in the following > case, I > > find that slave may crash from time to time. > > > > Reproduce process > > 1. start mesos cluster > > 2. start hadoop jobtracker, then jobtracker will register to mesos > > 3. submit some hadoop jobs, and keep them running. > > 4. kill all the mesos master and slave > > 5. restart mesos cluster > > > > Then, after slave is restarted. Sometimes, some slave will crashes. I got > > the log of the slave. Hoping it will help. > > > > I0702 19:03:32.684700 24900 slave.cpp:2510] Current usage 71.33%. Max > > allowed age: 1.306860088778333days > > 2013-07-02 19:03:33,174:24890(0x41057940):ZOO_WARN@zookeeper_interest > > @1461: > > Exceeded deadline by 28ms > > 2013-07-02 19:03:33,180:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > @1983: > > Got ping response in 5 ms > > 2013-07-02 19:03:36,565:24890(0x41057940):ZOO_WARN@zookeeper_interest > > @1461: > > Exceeded deadline by 57ms > > 2013-07-02 19:03:36,566:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > @1983: > > Got ping response in 0 ms > > 2013-07-02 19:03:39,906:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > @1983: > > Got ping response in 6 ms > > 2013-07-02 19:03:43,245:24890(0x41057940):ZOO_WARN@zookeeper_interest > > @1461: > > Exceeded deadline by 12ms > > 2013-07-02 19:03:43,292:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > @1983: > > Got ping response in 46 ms > > 2013-07-02 19:03:46,588:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > @1983: > > Got ping response in 9 ms > > 2013-07-02 19:03:49,913:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > @1983: > > Got ping response in 0 ms > > 2013-07-02 19:03:53,277:24890(0x41057940):ZOO_WARN@zookeeper_interest > > @1461: > > Exceeded deadline by 31ms > > 2013-07-02 19:03:53,293:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > @1983: > > Got ping response in 15 ms > > 2013-07-02 19:03:56,611:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > @1983: > > Got ping response in 0 ms > > 2013-07-02 19:03:59,967:24890(0x41057940):ZOO_WARN@zookeeper_interest > > @1461: > > Exceeded deadline by 22ms > > 2013-07-02 19:03:59,968:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > @1983: > > Got ping response in 0 ms > > 2013-07-02 19:04:03,335:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > @1983: > > Got ping response in 33 ms > > 2013-07-02 19:04:06,672:24890(0x41057940):ZOO_WARN@zookeeper_interest > > @1461: > > Exceeded deadline by 36ms > > 2013-07-02 19:04:06,691:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > @1983: > > Got ping response in 18 ms > > 2013-07-02 19:04:10,012:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > @1983: > > Got ping response in 6 ms > > 2013-07-02 19:04:13,344:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > @1983: > > Got ping response in 3 ms > > 2013-07-02 19:04:16,707:24890(0x41057940):ZOO_WARN@zookeeper_interest > > @1461: > > Exceeded deadline by 32ms > > 2013-07-02 19:04:16,737:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > @1983: > > Got ping response in 30 ms > > 2013-07-02 19:04:20,057:24890(0x41057940):ZOO_WARN@zookeeper_interest > > @1461: > > Exceeded deadline by 16ms > > 2013-07-02 19:04:20,067:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > @1983: > > Got ping response in 10 ms > > 2013-07-02 19:04:23,410:24890(0x41057940):ZOO_WARN@zookeeper_interest > > @1461: > > Exceeded deadline by 19ms > > 2013-07-02 19:04:23,411:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > @1983: > > Got ping response in 1 ms > > 2013-07-02 19:04:26,820:24890(0x41057940):ZOO_WARN@zookeeper_interest > > @1461: > > Exceeded deadline by 77ms > > 2013-07-02 19:04:26,919:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > @1983: > > Got ping response in 98 ms > > 2013-07-02 19:04:30,163:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > @1983: > > Got ping response in 0 ms > > I0702 19:04:32.685693 24892 slave.cpp:2510] Current usage 71.33%. Max > > allowed age: 1.306755345349155days > > 2013-07-02 19:04:33,514:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > @1983: > > Got ping response in 17 ms > > 2013-07-02 19:04:36,832:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > @1983: > > Got ping response in 1 ms > > 2013-07-02 19:04:40,164:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > @1983: > > Got ping response in 0 ms > > 2013-07-02 19:04:43,498:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > @1983: > > Got ping response in 0 ms > > 2013-07-02 19:04:46,878:24890(0x41057940):ZOO_WARN@zookeeper_interest > > @1461: > > Exceeded deadline by 46ms > > 2013-07-02 19:04:46,880:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > @1983: > > Got ping response in 1 ms > > 2013-07-02 19:04:50,282:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > @1983: > > Got ping response in 71 ms > > 2013-07-02 19:04:53,565:24890(0x41057940):ZOO_DEBUG@zookeeper_process > > @1983: > > Got ping response in 19 ms > > Result::get() but state == NONE > > *** Aborted at 1372763096 (unix time) try "date -d @1372763096" if you > are > > using GNU date *** > > PC: @ 0x3d87a30215 (unknown) > > *** SIGABRT (@0x613a) received by PID 24890 (TID 0x4878f940) from PID > > 24890; stack trace: *** > > @ 0x3d8860e4c0 (unknown) > > @ 0x3d87a30215 (unknown) > > @ 0x3d87a31cc0 (unknown) > > @ 0x2b02c1bf96e5 mesos::internal::slave::ProcessIsolator::usage() > > @ 0x2b02c1b59a30 std::tr1::_Function_handler<>::_M_invoke() > > @ 0x2b02c1b5a361 std::tr1::function<>::operator()() > > @ 0x2b02c1b63f2b process::internal::pdispatcher<>() > > @ 0x2b02c1b5c45e std::tr1::_Function_handler<>::_M_invoke() > > @ 0x2b02c1dbf205 process::ProcessManager::resume() > > @ 0x2b02c1dbfbbf process::schedule() > > @ 0x3d88606367 (unknown) > > @ 0x3d87ad30ad (unknown) > > > > > > > > > > Guodong > > >
