[ 
https://issues.apache.org/jira/browse/MESOS-7795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Pronin updated MESOS-7795:
-------------------------------
    Description: 
Currently when the agent detects that the host was rebooted it doesn't recover 
agent info. New agent info is not checkpointed until the agent successfully 
registers with a master. If the agent crashes before registering, on restart it 
will recover the old agent info that was checkpointed before host reboot.

This can lead to problems. E.g. the agent may flap due to incompatible agent 
info, if its resources somehow change after reboot. Or the usage of the old 
agent ID in reregistration process may cause crashes like MESOS-7432.

We can remove the "latest" symlink when we detect that current boot ID is 
different from the checkpointed one in order to prevent the agent from 
recovering stale info after we checkpoint new boot ID. Or we can postpone boot 
ID checkpointing until we checkpointed new agent info.

  was:
Currently when the agent detects that the host was rebooted it doesn't recover 
agent info. New agent info is not checkpointed until the agent successfully 
registers with a master. If the agent crashes before registering, on restart it 
will recover the old agent info that was checkpointed before host reboot.

This can lead to problems. E.g. the agent may flap due to incompatible agent 
info, if its resources somehow change after reboot. Or the usage of the old 
agent ID in reregistration process may cause crashes like MESOS-7432.

We can remove the "latest" symlink when we detect that current boot ID is 
different from the checkpointed one in order to prevent the agent from 
recovering stale info after we checkpoint new boot ID.


> Remove "latest" symlink after agent reboot
> ------------------------------------------
>
>                 Key: MESOS-7795
>                 URL: https://issues.apache.org/jira/browse/MESOS-7795
>             Project: Mesos
>          Issue Type: Improvement
>          Components: agent
>            Reporter: Ilya Pronin
>            Priority: Minor
>
> Currently when the agent detects that the host was rebooted it doesn't 
> recover agent info. New agent info is not checkpointed until the agent 
> successfully registers with a master. If the agent crashes before 
> registering, on restart it will recover the old agent info that was 
> checkpointed before host reboot.
> This can lead to problems. E.g. the agent may flap due to incompatible agent 
> info, if its resources somehow change after reboot. Or the usage of the old 
> agent ID in reregistration process may cause crashes like MESOS-7432.
> We can remove the "latest" symlink when we detect that current boot ID is 
> different from the checkpointed one in order to prevent the agent from 
> recovering stale info after we checkpoint new boot ID. Or we can postpone 
> boot ID checkpointing until we checkpointed new agent info.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to