[
https://issues.apache.org/jira/browse/MESOS-367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vinod Kone resolved MESOS-367.
------------------------------
Resolution: Fixed
> Invalid StatusUpdateMessage from missing slave id.
> --------------------------------------------------
>
> Key: MESOS-367
> URL: https://issues.apache.org/jira/browse/MESOS-367
> Project: Mesos
> Issue Type: Bug
> Reporter: Benjamin Mahler
> Assignee: Vinod Kone
> Priority: Critical
>
> It looks like the ExecutorProcess sets its internal slaveId upon registration:
> void registered(const ExecutorInfo& executorInfo,
> const FrameworkID& frameworkId,
> const FrameworkInfo& frameworkInfo,
> const SlaveID& slaveId,
> const SlaveInfo& slaveInfo)
> {
> if (aborted) {
> VLOG(1) << "Ignoring registered message from slave " << slaveId
> << " because the driver is aborted!";
> return;
> }
> VLOG(1) << "Executor registered on slave " << slaveId;
> **** this->slaveId = slaveId; ***
> executor->registered(driver, executorInfo, frameworkInfo, slaveInfo);
> }
> A result of this is that if the registration is delayed, the executor can
> come up and send a status update (before the slaveId is set), resulting in an
> incomplete protobuf:
> void sendStatusUpdate(const TaskStatus& status)
> {
> VLOG(1) << "Executor sending status update for task "
> << status.task_id() << " in state " << status.state();
> if (status.state() == TASK_STAGING) {
> VLOG(1) << "Executor is not allowed to send "
> << "TASK_STAGING status updates. Aborting!";
> driver->abort();
> executor->error(driver, "Attempted to send TASK_STAGING status update");
> return;
> }
> StatusUpdateMessage message;
> StatusUpdate* update = message.mutable_update();
> update->mutable_framework_id()->MergeFrom(frameworkId);
> update->mutable_executor_id()->MergeFrom(executorId);
> **** update->mutable_slave_id()->MergeFrom(slaveId); ****
> update->mutable_status()->MergeFrom(status);
> update->set_timestamp(Clock::now());
> update->set_uuid(UUID::random().toBytes());
> send(slave, message);
> }
> The ExecutorProcess should take the slaveId in its constructor to avoid this
> issue.
> Here are the relevant log lines:
> I0227 23:45:56.547392 38406 slave.cpp:762] Got registration for executor
> 'thermos-1362008747374-wickman-seizure-4-933a8193-96b1-411f-9392-3e4bd2cda6f0'
> of framework 201103282247-0000000019-0000
> I0227 23:45:56.547610 38411 cgroups_isolation_module.cpp:571] Changing cgroup
> controls for executor
> thermos-1362008747374-wickman-seizure-4-933a8193-96b1-411f-9392-3e4bd2cda6f0
> of framework 201103282247-00000000
> 19-0000 with resources cpus=0.35; mem=176; disk=512; ports=[31385-31385]
> I0227 23:45:56.547863 38406 slave.cpp:820] Flushing queued tasks for
> framework 201103282247-0000000019-0000
> I0227 23:45:56.548074 38411 cgroups_isolation_module.cpp:676] Updated
> 'cpu.shares' to 358 for executor
> thermos-1362008747374-wickman-seizure-4-933a8193-96b1-411f-9392-3e4bd2cda6f0
> of framework 201103282247-00000
> 00019-0000
> I0227 23:45:56.548812 38411 cgroups_isolation_module.cpp:774] Updated
> 'memory.limit_in_bytes' to 184549376 for executor
> thermos-1362008747374-wickman-seizure-4-933a8193-96b1-411f-9392-3e4bd2cda6f0
> of framework 2
> 01103282247-0000000019-0000
> libprotobuf ERROR google/protobuf/message_lite.cc:123] Can't parse message of
> type "mesos.internal.StatusUpdateMessage" because it is missing required
> fields: update.slave_id.value
> W0227 23:45:56.663353 38408 protobuf.hpp:252] Initialization errors:
> update.slave_id.value
> libprotobuf ERROR google/protobuf/message_lite.cc:123] Can't parse message of
> type "mesos.internal.StatusUpdateMessage" because it is missing required
> fields: update.slave_id.value
> W0227 23:45:56.673761 38400 protobuf.hpp:252] Initialization errors:
> update.slave_id.value
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira