[ 
https://issues.apache.org/jira/browse/MESOS-4894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-4894:
-------------------------------
    Description: 
If an agent fails health checks, it is removed from the cluster. The next time 
the agent connects to the master, it is instructed to shutdown and all 
tasks/executors are killed. The next time the agent is started, it will be 
assigned a new agent ID. Any persistent volumes from the previous agent 
instance will be preserved, but they will now be associated with a new agent ID.

This is problematic because volume IDs do not need to be globally unique. 
Hence, it is natural for frameworks to use the pair {{<agent-id, volume-id>}} 
to uniquely identify a volume. If volume k moves from agent foo to agent bar, 
it is hard for frameworks to determine whether <bar,k> is the "same" volume 
that was previously called <foo,k> (they might be able to figure this out from 
`slaveLost` callbacks, but those aren't reliable). Similarly, the HTTP 
endpoints for volumes and dynamic reservations include a slave ID.

  was:
If an agent fails health checks, it is removed from the cluster. The next time 
the agent connects to the master, it is instructed to shutdown and all 
tasks/executors are killed. The next time the agent is started, it will be 
assigned a new agent ID. Any persistent volumes from the previous agent 
instance will be preserved, but they will now be associated with a new agent ID.

This is problematic because volume IDs do not need to be globally unique. 
Hence, it is natural for frameworks to use the pair {{<agent-id, volume-id>}} 
to uniquely identify a volume. If volume k moves from agent foo to agent bar, 
it is hard for frameworks to determine whether <bar,k> is the "same" volume 
that was previously called <foo,k> (they might be able to figure this out from 
`slaveLost` callbacks, but those aren't reliable).


> Volumes, reservations can move to new agent IDs after partition
> ---------------------------------------------------------------
>
>                 Key: MESOS-4894
>                 URL: https://issues.apache.org/jira/browse/MESOS-4894
>             Project: Mesos
>          Issue Type: Bug
>          Components: slave
>            Reporter: Neil Conway
>              Labels: mesosphere, persistent-volumes
>
> If an agent fails health checks, it is removed from the cluster. The next 
> time the agent connects to the master, it is instructed to shutdown and all 
> tasks/executors are killed. The next time the agent is started, it will be 
> assigned a new agent ID. Any persistent volumes from the previous agent 
> instance will be preserved, but they will now be associated with a new agent 
> ID.
> This is problematic because volume IDs do not need to be globally unique. 
> Hence, it is natural for frameworks to use the pair {{<agent-id, volume-id>}} 
> to uniquely identify a volume. If volume k moves from agent foo to agent bar, 
> it is hard for frameworks to determine whether <bar,k> is the "same" volume 
> that was previously called <foo,k> (they might be able to figure this out 
> from `slaveLost` callbacks, but those aren't reliable). Similarly, the HTTP 
> endpoints for volumes and dynamic reservations include a slave ID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to