[ https://issues.apache.org/jira/browse/MESOS-9507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gilbert Song reassigned MESOS-9507: ----------------------------------- Assignee: Gilbert Song Sprint: Containerization RI10 Spr 39 Story Points: 5 > Agent could not recover due to empty docker volume checkpointed files. > ---------------------------------------------------------------------- > > Key: MESOS-9507 > URL: https://issues.apache.org/jira/browse/MESOS-9507 > Project: Mesos > Issue Type: Bug > Components: containerization > Reporter: Gilbert Song > Assignee: Gilbert Song > Priority: Critical > Labels: containerizer > > Agent could not recover due to empty docker volume checkpointed files. Please > see logs: > {noformat} > Nov 12 17:12:00 guppy mesos-agent[38960]: E1112 17:12:00.978682 38969 > slave.cpp:6279] EXIT with status 1: Failed to perform recovery: Collect > failed: Collect failed: Failed to recover docker volumes for orphan container > e1b04051-1e4a-47a9-b866-1d625cda1d22: JSON parse failed: syntax error at line > 1 near: > Nov 12 17:12:00 guppy mesos-agent[38960]: To remedy this do as follows: > Nov 12 17:12:00 guppy mesos-agent[38960]: Step 1: rm -f > /var/lib/mesos/slave/meta/slaves/latest > Nov 12 17:12:00 guppy mesos-agent[38960]: This ensures agent doesn't recover > old live executors. > Nov 12 17:12:00 guppy mesos-agent[38960]: Step 2: Restart the agent. > Nov 12 17:12:00 guppy systemd[1]: dcos-mesos-slave.service: main process > exited, code=exited, status=1/FAILURE > Nov 12 17:12:00 guppy systemd[1]: Unit dcos-mesos-slave.service entered > failed state. > Nov 12 17:12:00 guppy systemd[1]: dcos-mesos-slave.service failed. > {noformat} > This is caused by agent recovery after the volume state file is created but > before checkpointing finishes. Basically the docker volume is not mounted > yet, so the docker volume isolator should skip recovering this volume. -- This message was sent by Atlassian JIRA (v7.6.3#76005)