[
https://issues.apache.org/jira/browse/MESOS-8519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349020#comment-16349020
]
Andrew Schwartzmeyer commented on MESOS-8519:
---------------------------------------------
I mean quite literally the documentation states:
> The job is destroyed when its last handle has been closed _and all associated
> processes have been terminated_.
Emphasis mine. This is not true though. The job is destroyed when its last
handle has been closed. Period. Associated processes do not matter.
Okay, the job might still be around (since it doesn't kill the associated
processes), but it becomes unusable and so for our intents is destroyed, as you
cannot obtain a new handle to it by name.
> Fix recovery of job object isolated tasks
> -----------------------------------------
>
> Key: MESOS-8519
> URL: https://issues.apache.org/jira/browse/MESOS-8519
> Project: Mesos
> Issue Type: Choose from below ...
> Components: agent
> Environment: Windows 10 Client 16299.192
> Reporter: Andrew Schwartzmeyer
> Assignee: Andrew Schwartzmeyer
> Priority: Major
> Labels: windows
>
> While the chain starting at https://reviews.apache.org/r/65397/ fixes many of
> the bugs leading up to the enabling of agent recovery on Windows (and indeed,
> enables it fully for Docker tasks), it explicitly does not yet enable the
> recovery of tasks contained in a job object.
> This JIRA issues specifically covers the bug where the agent fails to find an
> existing job object contained task, because it cannot find the job object
> when its back up. The task still exists, and when first launched, is named
> appropriately, and that name is checkpointed correctly and used by the
> recovering agent to find it again, but it fails because the job object the
> task is in has "lost" it's name.
> Inspecting it in process explorer, I verified the container process initially
> is in the correctly named job object, but after the parent process (the
> initial mesos agent) dies, while the container is still running, process
> explorer reports "Access Denied" for the job object name.
> My hypothesis is that this is related to the kernel object namespace
> mechanism. Currently researching.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)