Hey guys,

I'm getting another slave crash with the process usage stuff.  Here's the
log:

I0701 16:44:51.263236 11682 slave.cpp:528] New master detected at
> [email protected]:5050
> I0701 16:44:51.263598 11672 gc.cpp:56] Scheduling
> '/tmp/mesos/slaves/201306291951-3660134922-5050-13580-1538' for removal
> I0701 16:44:51.264078 11676 status_update_manager.cpp:155] New master
> detected at [email protected]:5050
> I0701 16:44:51.283917 11672 slave.cpp:588] Registered with master
> [email protected]:5050; given slave ID
> 201306291951-3660134922-5050-13580-2759
> I0701 16:44:52.194198 11657 slave.cpp:1413] Got registration for executor
> 'executor_Task_Tracker_37434' of framework
> 201306291951-3660134922-5050-13580-0002
> W0701 16:44:52.194744 11657 slave.cpp:1438]  Shutting down executor
> 'executor_Task_Tracker_37434' as the framework
> 201306291951-3660134922-5050-13580-0002 does not
> exist
> I0701 16:44:56.630949 11666 slave.cpp:738] Got assigned task
> Task_Tracker_37439 for framework 201306291951-3660134922-5050-13580-0002
> I0701 16:44:56.632294 11666 slave.cpp:836] Launching task
> Task_Tracker_37439 for framework 201306291951-3660134922-5050-13580-0002
> I0701 16:44:56.634282 11666 paths.hpp:303] Created executor directory
> '/tmp/mesos/slaves/201306291951-3660134922-5050-13580-2759/frameworks/201306291951-3660134922-
>
> 5050-13580-0002/executors/executor_Task_Tracker_37439/runs/cf5d7062-b1cd-4da6-8b67-e7d3caa8bc9d'
> I0701 16:44:56.634918 11666 slave.cpp:947] Queuing task
> 'Task_Tracker_37439' for executor executor_Task_Tracker_37439 of framework
> '201306291951-3660134922-5050-135
> 80-0002
> I0701 16:44:56.634908 11683 process_isolator.cpp:99] Launching
> executor_Task_Tracker_37439 (cd hadoop-* && ./bin/mesos-executor) in
> /tmp/mesos/slaves/201306291951-3
> 660134922-5050-13580-2759/frameworks/201306291951-3660134922-5050-13580-0002/executors/executor_Task_Tracker_37439/runs/cf5d7062-b1cd-4da6-8b67-e7d3caa8bc9d
> with re
> sources cpus=1; mem=5000' for framework
> 201306291951-3660134922-5050-13580-0002
> I0701 16:44:56.637537 11666 slave.cpp:510] Successfully attached file
> '/tmp/mesos/slaves/201306291951-3660134922-5050-13580-2759/frameworks/201306291951-3660134922-5050-13580-0002/executors/executor_Task_Tracker_37439/runs/cf5d7062-b1cd-4da6-8b67-e7d3caa8bc9d'
> I0701 16:44:56.637923 11683 process_isolator.cpp:161] Forked executor at
> 11809
> Fetching resources into
> '/tmp/mesos/slaves/201306291951-3660134922-5050-13580-2759/frameworks/201306291951-3660134922-5050-13580-0002/executors/executor_Task_Tracker_37439/runs/cf5d7062-b1cd-4da6-8b67-e7d3caa8bc9d'
> Fetching resource 'hdfs://airfs-h1/hadoop-2.0.0-mr1-cdh4.2.1-mesos.tar.xz'
> Downloading resource from
> 'hdfs://airfs-h1/hadoop-2.0.0-mr1-cdh4.2.1-mesos.tar.xz'
> HDFS command: hadoop fs -copyToLocal
> 'hdfs://airfs-h1/hadoop-2.0.0-mr1-cdh4.2.1-mesos.tar.xz'
> './hadoop-2.0.0-mr1-cdh4.2.1-mesos.tar.xz'
> Extracting resource: tar xJf './hadoop-2.0.0-mr1-cdh4.2.1-mesos.tar.xz'
> Try::get() but state == ERROR: Argument larger than the maximum number of
> seconds that a Duration can represent due to int64_t's size limit.
> *** Aborted at 1372697101 (unix time) try "date -d @1372697101" if you are
> using GNU date ***
> PC: @     0x7f907ac82425 (unknown)
> *** SIGABRT (@0x2d69) received by PID 11625 (TID 0x7f906e4b8700) from PID
> 11625; stack trace: ***
>     @     0x7f907b01acb0 (unknown)
>     @     0x7f907ac82425 (unknown)
>     @     0x7f907ac85b8b (unknown)
>     @     0x7f907bb274ea os::process()
>     @     0x7f907bb296d2 os::processes()
>     @     0x7f907bb2b78c os::children()
>     @     0x7f907bb1f5d3 mesos::internal::slave::ProcessIsolator::usage()
>     @     0x7f907baaa5b0 std::tr1::_Function_handler<>::_M_invoke()
>     @     0x7f907bab8526 process::internal::pdispatcher<>()
>     @     0x7f907baab808 std::tr1::_Function_handler<>::_M_invoke()
>     @     0x7f907bc9d17c process::ProcessManager::resume()
>     @     0x7f907bc9dddc process::schedule()
>     @     0x7f907b012e9a start_thread
>     @     0x7f907ad3fccd (unknown)
> I0701 16:45:02.274014 11899 main.cpp:119] Creating "process" isolator
> I0701 16:45:02.274749 11899 main.cpp:127] Build: 2013-06-18 01:38:35 by
> I0701 16:45:02.274782 11899 main.cpp:128] Starting Mesos slave



Here's the gdb backtrace:

(gdb) bt
> #0  0x00007f8da4577425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x00007f8da457ab8b in abort () from /lib/x86_64-linux-gnu/libc.so.6
> #2  0x00007f8da541c4ea in get (this=0x7f8d965a91e0) at
> ../../3rdparty/libprocess/3rdparty/stout/include/stout/try.hpp:66
> #3  os::process (pid=<optimized out>) at
> ../../3rdparty/libprocess/3rdparty/stout/include/stout/os/linux.hpp:57
> #4  0x00007f8da541e6d2 in os::processes () at
> ../../3rdparty/libprocess/3rdparty/stout/include/stout/os.hpp:984
> #5  0x00007f8da542078c in os::children (pid=12260, recursive=true) at
> ../../3rdparty/libprocess/3rdparty/stout/include/stout/os.hpp:997
> #6  0x00007f8da54145d3 in mesos::internal::slave::ProcessIsolator::usage
> (this=<optimized out>, frameworkId=..., executorId=...)
>     at ../../src/slave/process_isolator.cpp:396
> #7  0x00007f8da539f5b0 in operator() (__args#1=..., __args#0=...,
> this=<optimized out>, __object=<optimized out>) at
> /usr/include/c++/4.6/tr1/functional:572
> #8  __call<mesos::internal::slave::Isolator*&, 0, 1, 2> (__args=...,
> this=<optimized out>) at /usr/include/c++/4.6/tr1/functional:1153
> #9  operator()<mesos::internal::slave::Isolator*> (this=<optimized out>)
> at /usr/include/c++/4.6/tr1/functional:1207
> #10 std::tr1::_Function_handler<process::Future<mesos::ResourceStatistics>
> (mesos::internal::slave::Isolator*),
> std::tr1::_Bind<std::tr1::_Mem_fn<process::Future<mesos::ResourceStatistics>
> (mesos::internal::slave::Isolator::*)(mesos::FrameworkID const&,
> mesos::ExecutorID const&)> (std::tr1::_Placeholder<1>, mesos::FrameworkID,
> mesos::ExecutorID)> >::_M_invoke(std::tr1::_Any_data const&,
> mesos::internal::slave::Isolator*) (__functor=..., __args#0=<optimized out>)
>     at /usr/include/c++/4.6/tr1/functional:1670
> #11 0x00007f8da53ad526 in operator() (__args#0=<optimized out>,
> this=<optimized out>) at /usr/include/c++/4.6/tr1/functional:2040
> #12 process::internal::pdispatcher<mesos::ResourceStatistics,
> mesos::internal::slave::Isolator>(process::ProcessBase*,
> std::tr1::shared_ptr<std::tr1::function<process::Future<mesos::ResourceStatistics>
> (mesos::internal::slave::Isolator*)> >,
> std::tr1::shared_ptr<process::Promise<mesos::ResourceStatistics> >) (
>     process=<optimized out>, thunk=..., promise=...) at
> ../../3rdparty/libprocess/include/process/dispatch.hpp:86
> #13 0x00007f8da53a0808 in __call<process::ProcessBase*&, 0, 1, 2>
> (__args=..., this=<optimized out>) at
> /usr/include/c++/4.6/tr1/functional:1153
> #14 operator()<process::ProcessBase*> (this=<optimized out>) at
> /usr/include/c++/4.6/tr1/functional:1207
> #15 std::tr1::_Function_handler<void (process::ProcessBase*),
> std::tr1::_Bind<void (*(std::tr1::_Placeholder<1>,
> std::tr1::shared_ptr<std::tr1::function<process::Future<mesos::ResourceStatistics>
> (mesos::internal::slave::Isolator*)> >,
> std::tr1::shared_ptr<process::Promise<mesos::ResourceStatistics>
> >))(process::ProcessBase*,
> std::tr1::shared_ptr<std::tr1::function<process::Future<mesos::ResourceStatistics>
> (mesos::internal::slave::Isolator*)> >,
> std::tr1::shared_ptr<process::Promise<mesos::ResourceStatistics> >)>
> >::_M_invoke(std::tr1::_Any_data const&, process::ProcessBase*)
> (__functor=..., __args#0=<optimized out>)
>     at /usr/include/c++/4.6/tr1/functional:1684
> #16 0x00007f8da559217c in process::ProcessManager::resume (this=0xeedf20,
> process=0xf03df8) at ../../../3rdparty/libprocess/src/process.cpp:2446
> #17 0x00007f8da5592ddc in process::schedule (arg=<optimized out>) at
> ../../../3rdparty/libprocess/src/process.cpp:1175
> #18 0x00007f8da4907e9a in start_thread () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #19 0x00007f8da4634ccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #20 0x0000000000000000 in ?? ()
> (gdb)



It looks like either `ticks` or `utime`/`stime` has an erroneous value.
 Thoughts?

Reply via email to