[
https://issues.apache.org/jira/browse/MESOS-4894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Neil Conway updated MESOS-4894:
-------------------------------
Description:
If an agent fails health checks, it is removed from the cluster. The next time
the agent connects to the master, it is instructed to shutdown and all
tasks/executors are killed. The next time the agent is started, it will be
assigned a new agent ID. Any persistent volumes from the previous agent
instance will be preserved, but they will now be associated with a new agent ID.
This is problematic because volume IDs do not need to be globally unique.
Hence, it is natural for frameworks to use the pair {{<agent-id, volume-id>}}
to uniquely identify a volume. If volume k moves from agent foo to agent bar,
it is hard for frameworks to determine whether <bar,k> is the "same" volume
that was previously called <foo,k> (they might be able to figure this out from
`slaveLost` callbacks, but those aren't reliable). Similarly, the HTTP
endpoints for volumes and dynamic reservations include a slave ID.
was:
If an agent fails health checks, it is removed from the cluster. The next time
the agent connects to the master, it is instructed to shutdown and all
tasks/executors are killed. The next time the agent is started, it will be
assigned a new agent ID. Any persistent volumes from the previous agent
instance will be preserved, but they will now be associated with a new agent ID.
This is problematic because volume IDs do not need to be globally unique.
Hence, it is natural for frameworks to use the pair {{<agent-id, volume-id>}}
to uniquely identify a volume. If volume k moves from agent foo to agent bar,
it is hard for frameworks to determine whether <bar,k> is the "same" volume
that was previously called <foo,k> (they might be able to figure this out from
`slaveLost` callbacks, but those aren't reliable).
> Volumes, reservations can move to new agent IDs after partition
> ---------------------------------------------------------------
>
> Key: MESOS-4894
> URL: https://issues.apache.org/jira/browse/MESOS-4894
> Project: Mesos
> Issue Type: Bug
> Components: slave
> Reporter: Neil Conway
> Labels: mesosphere, persistent-volumes
>
> If an agent fails health checks, it is removed from the cluster. The next
> time the agent connects to the master, it is instructed to shutdown and all
> tasks/executors are killed. The next time the agent is started, it will be
> assigned a new agent ID. Any persistent volumes from the previous agent
> instance will be preserved, but they will now be associated with a new agent
> ID.
> This is problematic because volume IDs do not need to be globally unique.
> Hence, it is natural for frameworks to use the pair {{<agent-id, volume-id>}}
> to uniquely identify a volume. If volume k moves from agent foo to agent bar,
> it is hard for frameworks to determine whether <bar,k> is the "same" volume
> that was previously called <foo,k> (they might be able to figure this out
> from `slaveLost` callbacks, but those aren't reliable). Similarly, the HTTP
> endpoints for volumes and dynamic reservations include a slave ID.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)