Steve Niemitz created MESOS-2215:
------------------------------------

             Summary: If checkpointing is enabled on a framework, recovered 
tasks are no longer monitored once the slave restarts
                 Key: MESOS-2215
                 URL: https://issues.apache.org/jira/browse/MESOS-2215
             Project: Mesos
          Issue Type: Bug
            Reporter: Steve Niemitz


Once the slave restarts and recovers the task, I see this error in the log for 
all tasks that were recovered every second or so.  Note, these were NOT docker 
tasks:

W0113 16:01:00.790323 773142 monitor.cpp:213] Failed to get resource usage for  
container 7b729b89-dc7e-4d08-af97-8cd1af560a21 for executor 
thermos-1421085237813-slipstream-prod-agent-3-8f769514-1835-4151-90d0-3f55dcc940dd
 of framework 20150109-161713-715350282-5050-290797-0000: Failed to 'docker 
inspect mesos-7b729b89-dc7e-4d08-af97-8cd1af560a21': exit status = exited with 
status 1 stderr = Error: No such image or container: 
mesos-7b729b89-dc7e-4d08-af97-8cd1af560a21

However the tasks themselves are still healthy and running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to