Can you provide:
1. The version that you are upgrading from.
2. Whether you made any OS / init system changes alongside this upgrade
(just to narrow the scope).

It is possible that you are upgrading from a version that did not have
systemd support to one that does. If so, the upgrade may require restarting
the tasks (either by themselves, or just starting a fresh agent). Please
check out some of the work in MESOS-3007 to get a better understanding of
what the issue I am referring to is.

If you can verify that you are making one of these transitions from a bad
world to a good world, then you can devise a plan for your upgrade.

Joris

—
*Joris Van Remoortere*
Mesosphere

On Fri, Jun 17, 2016 at 8:28 AM, Qiang Chen <[email protected]> wrote:

> Hi all,
>
> I met an issue when upgrading mesos-slave to 0.28.2.
>
> At the process of recovering mesos-slave / framework container stage, it
> produced the following errors.
>
>
> ```
> Log file created at: 2016/06/15 15:01:43
> Running on machine: mesos-slave-online005-xxx.cloud.xxx.domain
> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
> W0615 15:01:43.285518  4182 linux_launcher.cpp:197] Couldn't find pid
> '42322' in 'mesos_executors.slice'. This can lead to lack of proper
> resource isolation
> W0615 15:01:43.286182  4182 linux_launcher.cpp:197] Couldn't find pid
> '42312' in 'mesos_executors.slice'. This can lead to lack of proper
> resource isolation
> W0615 15:01:43.286669  4182 linux_launcher.cpp:197] Couldn't find pid
> '42309' in 'mesos_executors.slice'. This can lead to lack of proper
> resource isolation
> W0615 15:01:43.287144  4182 linux_launcher.cpp:197] Couldn't find pid
> '42304' in 'mesos_executors.slice'. This can lead to lack of proper
> resource isolation
> W0615 15:01:43.287636  4182 linux_launcher.cpp:197] Couldn't find pid
> '42300' in 'mesos_executors.slice'. This can lead to lack of proper
> resource isolation
> W0615 15:01:43.288120  4182 linux_launcher.cpp:197] Couldn't find pid
> '42317' in 'mesos_executors.slice'. This can lead to lack of proper
> resource isolation
> E0615 15:01:43.471676  4201 process.cpp:1958] Failed to shutdown socket
> with fd 24: Transport endpoint is not connected
> E0615 15:01:43.476007  4201 process.cpp:1958] Failed to shutdown socket
> with fd 24: Transport endpoint is not connected
> E0615 15:01:43.476143  4201 process.cpp:1958] Failed to shutdown socket
> with fd 24: Transport endpoint is not connected
> E0615 15:01:43.476272  4201 process.cpp:1958] Failed to shutdown socket
> with fd 24: Transport endpoint is not connected
> E0615 15:01:43.476483  4201 process.cpp:1958] Failed to shutdown socket
> with fd 24: Transport endpoint is not connected
> E0615 15:01:43.476618  4201 process.cpp:1958] Failed to shutdown socket
> with fd 24: Transport endpoint is not connected
>
> ```
>
> And it will also cause the OOM errors, such as:
>
> ```
> I0615 15:01:43.324935  4172 mem.cpp:602] Started listening for OOM events
> for container f50b4c7a-d1d2-4fc8-abb9-5ab549f168dc
> I0615 15:01:43.325469 4172 mem.cpp:722] Started listening on low memory
> pressure events for container f50b4c7a-d1d2-4fc8-abb9-5ab549f168dc
> I0615 15:01:43.326004  4172 mem.cpp:722] Started listening on medium
> memory pressure events for container f50b4c7a-d1d2-4fc8-abb9-5ab549f168dc
> I0615 15:01:43.326539  4172 mem.cpp:722] Started listening on critical
> memory pressure events for container f50b4c7a-d1d2-4fc8-abb9-5ab549f168dc
>
> ```
>
> Did someone suffer this? thanks.
>
> --
> Best Regards,
> Chen, Qiang
>
>

Reply via email to