Benno Evers created MESOS-9657:
----------------------------------
Summary: Launching a command task twice can crash the agent
Key: MESOS-9657
URL: https://issues.apache.org/jira/browse/MESOS-9657
Project: Mesos
Issue Type: Bug
Reporter: Benno Evers
When launching a command task, we verify that the framework has no existing
executor for that task:
{noformat}
// We are dealing with command task; a new command executor will be
// launched.
CHECK(executor == nullptr);
{noformat}
and afterwards an executor is created with the same executor id as the task id:
{noformat}
// (slave.cpp)
// Either the master explicitly requests launching a new executor
// or we are in the legacy case of launching one if there wasn't
// one already. Either way, let's launch executor now.
if (executor == nullptr) {
Try<Executor*> added = framework->addExecutor(executorInfo);
[...]
{noformat}
This means that if we relaunch the task with the same task id before the
executor is removed, it will crash the agent:
{noformat}
F0315 16:39:32.822818 38112 slave.cpp:2865] Check failed: executor == nullptr
*** Check failure stack trace: ***
@ 0x7feb29a407af google::LogMessage::Flush()
@ 0x7feb29a43c3f google::LogMessageFatal::~LogMessageFatal()
@ 0x7feb28a5a886 mesos::internal::slave::Slave::__run()
@ 0x7feb28af4f0e
_ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8dispatchIN5mesos8internal5slave5SlaveERKNSA_13FrameworkInfoERKNSA_12ExecutorInfoERK6OptionINSA_8TaskInfoEERKSK_INSA_13TaskGroupInfoEERKSt6vectorINSB_19ResourceVersionUUIDESaISU_EERKSK_IbESG_SJ_SO_SS_SY_S11_EEvRKNS1_3PIDIT_EEMS13_FvT0_T1_T2_T3_T4_T5_EOT6_OT7_OT8_OT9_OT10_OT11_EUlOSE_OSH_OSM_OSQ_OSW_OSZ_S3_E_JSE_SH_SM_SQ_SW_SZ_St12_PlaceholderILi1EEEEEEclEOS3_
@ 0x7feb2998a620 process::ProcessBase::consume()
@ 0x7feb29987675 process::ProcessManager::resume()
@ 0x7feb299a2d2b
_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN7process14ProcessManager12init_threadsEvE3$_8EEEEE6_M_runEv
@ 0x7feb2632f523 (unknown)
@ 0x7feb25e40594 start_thread
@ 0x7feb25b73e6f __GI___clone
Aborted (core dumped)
{noformat}
Instead of crashing, the agent should just drop the task with an appropriate
error in this case.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)