[
https://issues.apache.org/jira/browse/MESOS-8507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669372#comment-16669372
]
Chun-Hung Hsiao commented on MESOS-8507:
----------------------------------------
[~xujyan] We revisited this issue recently. The current decoupling of
reservation and agent ID is a legacy issue. Conceptually, each distinct agent
ID is a new, different agent to Mesos, and so are its resources, so it does not
make sense to keep an old reservation on "new" resources. We have this
unfortunate legacy decoupling because we didn't have a proper way to preserve
agent IDs at the time persistent volumes are implemented. The situation has
been better now, but we're not there yet, as you mentioned that there's still
some cases where we need to discard an agent ID.
The current solution we (Mesosphere) have is to use the "default_reservation"
field in the resource provider info to reserve pre-existing volumes to a
certain role, and a special framework can register with that role to do some
recovery.
Alternatives include adding CSI credential support on `NodePublishVolume`
calls, so the CSI plugin can authorize if a certain workload is allowed to use
the volume. This will eventually be supported, but currently not prioritized
yet.
We could still work around by somehow preserve the reservation, but I'd rather
avoid going this route as it's conflicting with the long-term direction we'd
like to move forward to. WDYT?
cc [~jieyu]
> SLRP discards reservations when the agent is discarded, which could lead to
> leaked volumes.
> -------------------------------------------------------------------------------------------
>
> Key: MESOS-8507
> URL: https://issues.apache.org/jira/browse/MESOS-8507
> Project: Mesos
> Issue Type: Bug
> Reporter: Yan Xu
> Priority: Major
> Labels: storage
>
> In the current SLRP implementation the reservations for new SLRP/CSI backed
> volumes are checkpointed under {{<meta>/slaves/latest/resource_providers}} so
> when the agent runs into incompatible configuration changes (the kinds that
> cannot be addressed by MESOS-1739), the operator has to remove the symlink
> and then the reservations are gone.
> Then the agent recovers with a new {{SlaveInfo}} and new SLRPs are created to
> recover the CSI volumes. These CSI volumes will not have reservations and
> thus will be offered to frameworks of any role, potentially with the data
> already written by the previous owner.
>
> The framework doesn't have any control over this and any chance to clean up
> before the volumes are re-offered, which is undesired for security reasons.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)