James DeFelice created MESOS-4565:
-------------------------------------

             Summary: slave recovers and attempt to destroy executor's child 
containers, then begins rejecting task status updates
                 Key: MESOS-4565
                 URL: https://issues.apache.org/jira/browse/MESOS-4565
             Project: Mesos
          Issue Type: Bug
    Affects Versions: 0.26.0
            Reporter: James DeFelice


AFAICT the slave is doing this:

1) recovering from some kind of failure
2) checking the containers that it pulled from its state store
3) complaining about cgroup children hanging off of executor containers
4) rejecting task status updates related to the executor container, the first 
of which in the logs is:

{code}
E0130 02:22:21.979852 12683 slave.cpp:2963] Failed to update resources for 
container 1d965a20-849c-40d8-9446-27cb723220a9 of executor 
'd701ab48a0c0f13_k8sm-executor' running task 
pod.f2dc2c43-c6f7-11e5-ad28-0ad18c5e6c7f on status update for terminal task, 
destroying container: Container '1d965a20-849c-40d8-9446-27cb723220a9' not found
{code}

To be fair, I don't believe that my custom executor is re-registering properly 
with the slave prior to attempting to send these (failing) status updates. But 
the slave doesn't complain about that .. it complains that it can't find the 
**container**.

slave log here:
https://gist.github.com/jdef/265663461156b7a7ed4e



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to