Andrew Schwartzmeyer created MESOS-8519:
-------------------------------------------

             Summary: Fix recovery of job object isolated tasks
                 Key: MESOS-8519
                 URL: https://issues.apache.org/jira/browse/MESOS-8519
             Project: Mesos
          Issue Type: Choose from below ...
          Components: agent
         Environment: Windows 10 Client 16299.192
            Reporter: Andrew Schwartzmeyer
            Assignee: Andrew Schwartzmeyer


While the chain starting at https://reviews.apache.org/r/65397/ fixes many of 
the bugs leading up to the enabling of agent recovery on Windows (and indeed, 
enables it fully for Docker tasks), it explicitly does not yet enable the 
recovery of tasks contained in a job object.

This JIRA issues specifically covers the bug where the agent fails to find an 
existing job object contained task, because it cannot find the job object when 
its back up. The task still exists, and when first launched, is named 
appropriately, and that name is checkpointed correctly and used by the 
recovering agent to find it again, but it fails because the job object the task 
is in has "lost" it's name.

Inspecting it in process explorer, I verified the container process initially 
is in the correctly named job object, but after the parent process (the initial 
mesos agent) dies, while the container is still running, process explorer 
reports "Access Denied" for the job object name.

My hypothesis is that this is related to the kernel object namespace mechanism. 
Currently researching.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to