[
https://issues.apache.org/jira/browse/MESOS-7478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Benjamin Mahler updated MESOS-7478:
-----------------------------------
Description:
[~evilezh] reported the following crash in the agent upon running a 1.1.0
master against a 1.2.0 agent:
{noformat}
F0509 00:19:07.045413 3469 slave.cpp:4609] Check failed:
resource.has_allocation_info()
*** Check failure stack trace: ***
@ 0x7f4c4a4fa3cd google::LogMessage::Fail()
@ 0x7f4c4a4fc180 google::LogMessage::SendToLog()
@ 0x7f4c4a4f9fb3 google::LogMessage::Flush()
@ 0x7f4c4a4fcba9 google::LogMessageFatal::~LogMessageFatal()
@ 0x7f4c49b3bcf5 mesos::internal::slave::Slave::getExecutorInfo()
@ 0x7f4c49b3cf76 mesos::internal::slave::Slave::runTask()
@ 0x7f4c49b8832c ProtobufProcess<>::handler4<>()
@ 0x7f4c49b4dc06 std::_Function_handler<>::_M_invoke()
@ 0x7f4c49b6975a ProtobufProcess<>::visit()
@ 0x7f4c4a46c933 process::ProcessManager::resume()
@ 0x7f4c4a477537
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
@ 0x7f4c486b8c80 (unknown)
@ 0x7f4c481d46ba start_thread
@ 0x7f4c47f0a82d (unknown)
Aborted (core dumped)
{noformat}
This appears to have been due to a lack of manual upgrade testing (we also
don't have any automated upgrade testing in place).
The check in {{getExecutorInfo(...)}}
[here|https://github.com/apache/mesos/blob/1.2.0/src/slave/slave.cpp#L4609]
crashes with an old master because it occurs before our injection in
{{run(...)}}. See the {{runTask(...)}} call into {{getExecutorInfo(...)}}
[here|https://github.com/apache/mesos/blob/1.2.0/src/slave/slave.cpp#L1556].
was:
[~evilezh] reported the following crash in the agent upon running a 1.1.0
master against a 1.2.0 agent:
{noformat}
F0509 00:19:07.045413 3469 slave.cpp:4609] Check failed:
resource.has_allocation_info()
*** Check failure stack trace: ***
@ 0x7f4c4a4fa3cd google::LogMessage::Fail()
@ 0x7f4c4a4fc180 google::LogMessage::SendToLog()
@ 0x7f4c4a4f9fb3 google::LogMessage::Flush()
@ 0x7f4c4a4fcba9 google::LogMessageFatal::~LogMessageFatal()
@ 0x7f4c49b3bcf5 mesos::internal::slave::Slave::getExecutorInfo()
@ 0x7f4c49b3cf76 mesos::internal::slave::Slave::runTask()
@ 0x7f4c49b8832c ProtobufProcess<>::handler4<>()
@ 0x7f4c49b4dc06 std::_Function_handler<>::_M_invoke()
@ 0x7f4c49b6975a ProtobufProcess<>::visit()
@ 0x7f4c4a46c933 process::ProcessManager::resume()
@ 0x7f4c4a477537
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
@ 0x7f4c486b8c80 (unknown)
@ 0x7f4c481d46ba start_thread
@ 0x7f4c47f0a82d (unknown)
Aborted (core dumped)
{noformat}
This appears to have been due to a lack of manual upgrade testing (we don't
have any automated upgrade testing in place).
The check in {{getExecutorInfo(...)}}
[here|https://github.com/apache/mesos/blob/1.2.0/src/slave/slave.cpp#L4609]
crashes with an old master because it occurs before our injection in
{{run(...)}}. See the {{runTask(...)}} call into {{getExecutorInfo(...)}}
[here|https://github.com/apache/mesos/blob/1.2.0/src/slave/slave.cpp#L1556].
> Pre-1.2.x master does not work with 1.2.x agent.
> ------------------------------------------------
>
> Key: MESOS-7478
> URL: https://issues.apache.org/jira/browse/MESOS-7478
> Project: Mesos
> Issue Type: Bug
> Components: agent
> Reporter: Benjamin Mahler
> Priority: Blocker
>
> [~evilezh] reported the following crash in the agent upon running a 1.1.0
> master against a 1.2.0 agent:
> {noformat}
> F0509 00:19:07.045413 3469 slave.cpp:4609] Check failed:
> resource.has_allocation_info()
> *** Check failure stack trace: ***
> @ 0x7f4c4a4fa3cd google::LogMessage::Fail()
> @ 0x7f4c4a4fc180 google::LogMessage::SendToLog()
> @ 0x7f4c4a4f9fb3 google::LogMessage::Flush()
> @ 0x7f4c4a4fcba9 google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f4c49b3bcf5 mesos::internal::slave::Slave::getExecutorInfo()
> @ 0x7f4c49b3cf76 mesos::internal::slave::Slave::runTask()
> @ 0x7f4c49b8832c ProtobufProcess<>::handler4<>()
> @ 0x7f4c49b4dc06 std::_Function_handler<>::_M_invoke()
> @ 0x7f4c49b6975a ProtobufProcess<>::visit()
> @ 0x7f4c4a46c933 process::ProcessManager::resume()
> @ 0x7f4c4a477537
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f4c486b8c80 (unknown)
> @ 0x7f4c481d46ba start_thread
> @ 0x7f4c47f0a82d (unknown)
> Aborted (core dumped)
> {noformat}
> This appears to have been due to a lack of manual upgrade testing (we also
> don't have any automated upgrade testing in place).
> The check in {{getExecutorInfo(...)}}
> [here|https://github.com/apache/mesos/blob/1.2.0/src/slave/slave.cpp#L4609]
> crashes with an old master because it occurs before our injection in
> {{run(...)}}. See the {{runTask(...)}} call into {{getExecutorInfo(...)}}
> [here|https://github.com/apache/mesos/blob/1.2.0/src/slave/slave.cpp#L1556].
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)