[ 
https://issues.apache.org/jira/browse/MESOS-8507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669372#comment-16669372
 ] 

Chun-Hung Hsiao commented on MESOS-8507:
----------------------------------------

[~xujyan] We revisited this issue recently. The current decoupling of 
reservation and agent ID is a legacy issue. Conceptually, each distinct agent 
ID is a new, different agent to Mesos, and so are its resources, so it does not 
make sense to keep an old reservation on "new" resources. We have this 
unfortunate legacy decoupling because we didn't have a proper way to preserve 
agent IDs at the time persistent volumes are implemented. The situation has 
been better now, but we're not there yet, as you mentioned that there's still 
some cases where we need to discard an agent ID.

The current solution we (Mesosphere) have is to use the "default_reservation" 
field in the resource provider info to reserve pre-existing volumes to a 
certain role, and a special framework can register with that role to do some 
recovery.

Alternatives include adding CSI credential support on `NodePublishVolume` 
calls, so the CSI plugin can authorize if a certain workload is allowed to use 
the volume. This will eventually be supported, but currently not prioritized 
yet.

We could still work around by somehow preserve the reservation, but I'd rather 
avoid going this route as it's conflicting with the long-term direction we'd 
like to move forward to. WDYT?

cc [~jieyu]

> SLRP discards reservations when the agent is discarded, which could lead to 
> leaked volumes.
> -------------------------------------------------------------------------------------------
>
>                 Key: MESOS-8507
>                 URL: https://issues.apache.org/jira/browse/MESOS-8507
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Yan Xu
>            Priority: Major
>              Labels: storage
>
> In the current SLRP implementation the reservations for new SLRP/CSI backed 
> volumes are checkpointed under {{<meta>/slaves/latest/resource_providers}} so 
> when the agent runs into incompatible configuration changes (the kinds that 
> cannot be addressed by MESOS-1739), the operator has to remove the symlink 
> and then the reservations are gone. 
> Then the agent recovers with a new {{SlaveInfo}} and new SLRPs are created to 
> recover the CSI volumes. These CSI volumes will not have reservations and 
> thus will be offered to frameworks of any role, potentially with the data 
> already written by the previous owner. 
>  
> The framework doesn't have any control over this and any chance to clean up 
> before the volumes are re-offered, which is undesired for security reasons.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to