I think this was recently fixed. Can you try building from the latest
"master"?


On Tue, Jul 2, 2013 at 8:05 PM, 王国栋 <[email protected]> wrote:

> I am doing some failover test about mesos nowadays.
>
> The code I am using is pulled from git master. And in the following case, I
> find that slave may crash from time to time.
>
> Reproduce process
> 1. start mesos cluster
> 2. start hadoop jobtracker, then jobtracker will register to mesos
> 3. submit some hadoop jobs, and keep them running.
> 4. kill all the mesos master and slave
> 5. restart mesos cluster
>
> Then, after slave is restarted. Sometimes, some slave will crashes. I got
> the log of the slave. Hoping it will help.
>
> I0702 19:03:32.684700 24900 slave.cpp:2510] Current usage 71.33%. Max
> allowed age: 1.306860088778333days
> 2013-07-02 19:03:33,174:24890(0x41057940):ZOO_WARN@zookeeper_interest
> @1461:
> Exceeded deadline by 28ms
> 2013-07-02 19:03:33,180:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> @1983:
> Got ping response in 5 ms
> 2013-07-02 19:03:36,565:24890(0x41057940):ZOO_WARN@zookeeper_interest
> @1461:
> Exceeded deadline by 57ms
> 2013-07-02 19:03:36,566:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> @1983:
> Got ping response in 0 ms
> 2013-07-02 19:03:39,906:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> @1983:
> Got ping response in 6 ms
> 2013-07-02 19:03:43,245:24890(0x41057940):ZOO_WARN@zookeeper_interest
> @1461:
> Exceeded deadline by 12ms
> 2013-07-02 19:03:43,292:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> @1983:
> Got ping response in 46 ms
> 2013-07-02 19:03:46,588:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> @1983:
> Got ping response in 9 ms
> 2013-07-02 19:03:49,913:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> @1983:
> Got ping response in 0 ms
> 2013-07-02 19:03:53,277:24890(0x41057940):ZOO_WARN@zookeeper_interest
> @1461:
> Exceeded deadline by 31ms
> 2013-07-02 19:03:53,293:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> @1983:
> Got ping response in 15 ms
> 2013-07-02 19:03:56,611:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> @1983:
> Got ping response in 0 ms
> 2013-07-02 19:03:59,967:24890(0x41057940):ZOO_WARN@zookeeper_interest
> @1461:
> Exceeded deadline by 22ms
> 2013-07-02 19:03:59,968:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> @1983:
> Got ping response in 0 ms
> 2013-07-02 19:04:03,335:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> @1983:
> Got ping response in 33 ms
> 2013-07-02 19:04:06,672:24890(0x41057940):ZOO_WARN@zookeeper_interest
> @1461:
> Exceeded deadline by 36ms
> 2013-07-02 19:04:06,691:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> @1983:
> Got ping response in 18 ms
> 2013-07-02 19:04:10,012:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> @1983:
> Got ping response in 6 ms
> 2013-07-02 19:04:13,344:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> @1983:
> Got ping response in 3 ms
> 2013-07-02 19:04:16,707:24890(0x41057940):ZOO_WARN@zookeeper_interest
> @1461:
> Exceeded deadline by 32ms
> 2013-07-02 19:04:16,737:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> @1983:
> Got ping response in 30 ms
> 2013-07-02 19:04:20,057:24890(0x41057940):ZOO_WARN@zookeeper_interest
> @1461:
> Exceeded deadline by 16ms
> 2013-07-02 19:04:20,067:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> @1983:
> Got ping response in 10 ms
> 2013-07-02 19:04:23,410:24890(0x41057940):ZOO_WARN@zookeeper_interest
> @1461:
> Exceeded deadline by 19ms
> 2013-07-02 19:04:23,411:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> @1983:
> Got ping response in 1 ms
> 2013-07-02 19:04:26,820:24890(0x41057940):ZOO_WARN@zookeeper_interest
> @1461:
> Exceeded deadline by 77ms
> 2013-07-02 19:04:26,919:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> @1983:
> Got ping response in 98 ms
> 2013-07-02 19:04:30,163:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> @1983:
> Got ping response in 0 ms
> I0702 19:04:32.685693 24892 slave.cpp:2510] Current usage 71.33%. Max
> allowed age: 1.306755345349155days
> 2013-07-02 19:04:33,514:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> @1983:
> Got ping response in 17 ms
> 2013-07-02 19:04:36,832:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> @1983:
> Got ping response in 1 ms
> 2013-07-02 19:04:40,164:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> @1983:
> Got ping response in 0 ms
> 2013-07-02 19:04:43,498:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> @1983:
> Got ping response in 0 ms
> 2013-07-02 19:04:46,878:24890(0x41057940):ZOO_WARN@zookeeper_interest
> @1461:
> Exceeded deadline by 46ms
> 2013-07-02 19:04:46,880:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> @1983:
> Got ping response in 1 ms
> 2013-07-02 19:04:50,282:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> @1983:
> Got ping response in 71 ms
> 2013-07-02 19:04:53,565:24890(0x41057940):ZOO_DEBUG@zookeeper_process
> @1983:
> Got ping response in 19 ms
> Result::get() but state == NONE
> *** Aborted at 1372763096 (unix time) try "date -d @1372763096" if you are
> using GNU date ***
> PC: @       0x3d87a30215 (unknown)
> *** SIGABRT (@0x613a) received by PID 24890 (TID 0x4878f940) from PID
> 24890; stack trace: ***
>     @       0x3d8860e4c0 (unknown)
>     @       0x3d87a30215 (unknown)
>     @       0x3d87a31cc0 (unknown)
>     @     0x2b02c1bf96e5 mesos::internal::slave::ProcessIsolator::usage()
>     @     0x2b02c1b59a30 std::tr1::_Function_handler<>::_M_invoke()
>     @     0x2b02c1b5a361 std::tr1::function<>::operator()()
>     @     0x2b02c1b63f2b process::internal::pdispatcher<>()
>     @     0x2b02c1b5c45e std::tr1::_Function_handler<>::_M_invoke()
>     @     0x2b02c1dbf205 process::ProcessManager::resume()
>     @     0x2b02c1dbfbbf process::schedule()
>     @       0x3d88606367 (unknown)
>     @       0x3d87ad30ad (unknown)
>
>
>
>
> Guodong
>

Reply via email to