Andrew Schwartzmeyer created MESOS-8519:
-------------------------------------------
Summary: Fix recovery of job object isolated tasks
Key: MESOS-8519
URL: https://issues.apache.org/jira/browse/MESOS-8519
Project: Mesos
Issue Type: Choose from below ...
Components: agent
Environment: Windows 10 Client 16299.192
Reporter: Andrew Schwartzmeyer
Assignee: Andrew Schwartzmeyer
While the chain starting at https://reviews.apache.org/r/65397/ fixes many of
the bugs leading up to the enabling of agent recovery on Windows (and indeed,
enables it fully for Docker tasks), it explicitly does not yet enable the
recovery of tasks contained in a job object.
This JIRA issues specifically covers the bug where the agent fails to find an
existing job object contained task, because it cannot find the job object when
its back up. The task still exists, and when first launched, is named
appropriately, and that name is checkpointed correctly and used by the
recovering agent to find it again, but it fails because the job object the task
is in has "lost" it's name.
Inspecting it in process explorer, I verified the container process initially
is in the correctly named job object, but after the parent process (the initial
mesos agent) dies, while the container is still running, process explorer
reports "Access Denied" for the job object name.
My hypothesis is that this is related to the kernel object namespace mechanism.
Currently researching.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)