Hi Ben/Scott, Can you provide the slave log of the repro?
thanx, @vinodkone On Mon, May 7, 2012 at 10:00 AM, Benjamin Hindman <[email protected]>wrote: > Hi Scott, > > Thanks for the report. I've been able to reproduce this and it is indeed a > regression. I've filed https://issues.apache.org/jira/browse/MESOS-190, > and > hopefully we'll get a fix out the door ASAP. > > Ben. > > > On Fri, May 4, 2012 at 5:11 PM, Scott Smith <[email protected]> wrote: > > > When I restart/kill early or otherwise interrupt my framework from the > > client, I often segfault the slave. I'm not sure if there is a bug in > > my executor, but it seems Mesos should be more resilient than this. > > > > Mesos subversion -r 1331158 > > > > I know optimized builds can be tricky to debug, but in this case it > > does look like it was trying to dereference the invalid Task* address > > (note that task matches %rdx, and the crashed assembly code is trying > > to dereference %rdx). > > > > Any suggestions? > > > > (gdb) bt > > #0 mesos::internal::slave::Slave::executorExited (this=0x1305820, > > frameworkId=..., executorId=..., status=0) at slave/slave.cpp:1400 > > #1 0x00007f0cf310526d in __call<process::ProcessBase*&, 0, 1> > (__args=..., > > this=<optimized out>) at /usr/include/c++/4.6/tr1/functional:1153 > > #2 operator()<process::ProcessBase*> (this=<optimized out>) > > at /usr/include/c++/4.6/tr1/functional:1207 > > #3 std::tr1::_Function_handler<void (process::ProcessBase*), > > std::tr1::_Bind<void (*(std::tr1::_Placeholder<1>, > > std::tr1::shared_ptr<std::tr1::function<void > > (mesos::internal::slave::Slave*)> >))(process::ProcessBase*, > > std::tr1::shared_ptr<std::tr1::function<void > > (mesos::internal::slave::Slave*)> >)> >::_M_invoke(std::tr1::_Any_data > > const&, process::ProcessBase*) (__functor=..., > > __args#0=<optimized out>) at /usr/include/c++/4.6/tr1/functional:1684 > > #4 0x00007f0cf32014a3 in std::tr1::function<void > > (process::ProcessBase*)>::operator()(process::ProcessBase*) const () > > from /home/ubuntu/cr/lib/libmesos-0.9.0.so > > #5 0x00007f0cf31f617f in > > process::ProcessBase::visit(process::DispatchEvent const&) () from > > /home/ubuntu/cr/lib/libmesos-0.9.0.so > > #6 0x00007f0cf31f885c in > > process::DispatchEvent::visit(process::EventVisitor*) const () from > > /home/ubuntu/cr/lib/libmesos-0.9.0.so > > #7 0x00007f0cf31f38cf in > > process::ProcessManager::resume(process::ProcessBase*) () from > > /home/ubuntu/cr/lib/libmesos-0.9.0.so > > #8 0x00007f0cf31ec783 in process::schedule(void*) () > > from /home/ubuntu/cr/lib/libmesos-0.9.0.so > > #9 0x00007f0cf26e5e9a in start_thread () > > from /lib/x86_64-linux-gnu/libpthread.so.0 > > #10 0x00007f0cf24134bd in clone () from /lib/x86_64-linux-gnu/libc.so.6 > > #11 0x0000000000000000 in ?? () > > (gdb) print task > > $1 = (mesos::internal::Task *) 0x3031406576616c73 > > (gdb) info register > > rax 0x7f0cf3647cf0 139693599784176 > > rbx 0x0 0 > > rcx 0x7f0ce8000038 139693408649272 > > rdx 0x3031406576616c73 3472627592201333875 > > rsi 0x2 2 > > rdi 0x7f0cf0613ac0 139693549238976 > > rbp 0x7f0ce80034c8 0x7f0ce80034c8 > > rsp 0x7f0cf0613c00 0x7f0cf0613c00 > > r8 0x7f0ce80009b0 139693408651696 > > r9 0x1 1 > > r10 0x6 6 > > r11 0x1 1 > > r12 0x7f0ce8001ca0 139693408656544 > > r13 0x7f0ce80056c0 139693408671424 > > r14 0x7f0ce8006cc0 139693408677056 > > r15 0x1305820 19945504 > > rip 0x7f0cf30fecd5 0x7f0cf30fecd5 > > <mesos::internal::slave::Slave::executorExited(mesos::FrameworkID > > const&, mesos::ExecutorID const&, int)+533> > > eflags 0x10206 [ PF IF RF ] > > cs 0xe033 57395 > > ss 0xe02b 57387 > > ds 0x0 0 > > es 0x0 0 > > fs 0x0 0 > > gs 0x0 0 > > > > disassemble: > > > > 0x00007f0cf30fecb9 <+505>: mov %rax,0x20(%rsp) > > 0x00007f0cf30fecbe <+510>: xor %ebx,%ebx > > 0x00007f0cf30fecc0 <+512>: cmp 0x20(%rsp),%r12 > > 0x00007f0cf30fecc5 <+517>: je 0x7f0cf30fed2e > > <mesos::internal::slave::Slave::executorExited(mesos::FrameworkID > > const&, mesos::ExecutorID const&, int)+622> > > 0x00007f0cf30fecc7 <+519>: test %r12,%r12 > > 0x00007f0cf30fecca <+522>: je 0x7f0cf30ff27d > > <mesos::internal::slave::Slave::executorExited(mesos::FrameworkID > > const&, mesos::ExecutorID const&, int)+1981> > > 0x00007f0cf30fecd0 <+528>: mov 0x28(%r12),%rdx > > => 0x00007f0cf30fecd5 <+533>: mov 0x70(%rdx),%edi > > 0x00007f0cf30fecd8 <+536>: mov %rdx,0x8(%rsp) > > 0x00007f0cf30fecdd <+541>: callq 0x7f0cf3062220 > > <_ZN5mesos8internal5slave19isTerminalTaskStateENS_9TaskStateE@plt> > > 0x00007f0cf30fece2 <+546>: test %al,%al > > 0x00007f0cf30fece4 <+548>: mov 0x8(%rsp),%rdx > > 0x00007f0cf30fece9 <+553>: je 0x7f0cf30ff020 > > <mesos::internal::slave::Slave::executorExited(mesos::FrameworkID > > const&, mesos::ExecutorID const&, int)+1376> > > 0x00007f0cf30fecef <+559>: test %rbp,%rbp > > 0x00007f0cf30fecf2 <+562>: je 0x7f0cf30ff244 > > <mesos::internal::slave::Slave::executorExited(mesos::FrameworkID > > const&, mesos::ExecutorID const&, int)+1---Type <return> to continue, > > or q <re > > > > -- > > Scott > > >
