[jira] [Updated] (MESOS-2408) Slave should reclaim storage for destroyed persistent volumes.

Neil Conway (JIRA) Thu, 10 Mar 2016 15:14:45 -0800

     [ 
https://issues.apache.org/jira/browse/MESOS-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Neil Conway updated MESOS-2408:
-------------------------------
    Description: 
At present, destroying a persistent volume does not cleanup any filesystem 
space that was used by the volume (it just removes the Mesos-level metadata 
about the volume). This effectively leads to a storage leak, which is bad. For 
task sandboxes, we do "garbage collection" to remove the sandbox at a later 
time to facilitate debugging failed tasks; for volumes, because they are 
explicitly deleted and are not tied to the lifecycle of a task, removing the 
associated storage immediately seems best.

To implement this safely, we'll either need to ensure that libprocess messages 
are delivered in-order, or else add some extra safe-guards to ensure that 
out-of-order {{CheckpointResources}} messages don't lead to accidental data 
loss.

  was:This is tricky in the case when a persistence id is re-used. When a 
persistent volume is destroyed explicitly by the framework, master deletes all 
information about this volume. That mean the master no longer has the ability 
to check if the persistence id is re-used (and reject the later attempt). On 
the slave side, we'll use some GC policy to remove directories associated with 
deleted persistent volumes (similar to how we GC sandboxes). That means the 
persistent volume directory won't be deleted immediately when the volume is 
destroyed by the framework explicitly. When the same persistence id is reused, 
we'll see the persistent volume still exists and we need to cancel the GC of 
that directory (similar to what we cancel the GC for meta directories during 
runTask).


> Slave should reclaim storage for destroyed persistent volumes.
> --------------------------------------------------------------
>
>                 Key: MESOS-2408
>                 URL: https://issues.apache.org/jira/browse/MESOS-2408
>             Project: Mesos
>          Issue Type: Task
>          Components: slave
>            Reporter: Jie Yu
>            Assignee: Neil Conway
>              Labels: mesosphere, persistent-volumes
>
> At present, destroying a persistent volume does not cleanup any filesystem 
> space that was used by the volume (it just removes the Mesos-level metadata 
> about the volume). This effectively leads to a storage leak, which is bad. 
> For task sandboxes, we do "garbage collection" to remove the sandbox at a 
> later time to facilitate debugging failed tasks; for volumes, because they 
> are explicitly deleted and are not tied to the lifecycle of a task, removing 
> the associated storage immediately seems best.
> To implement this safely, we'll either need to ensure that libprocess 
> messages are delivered in-order, or else add some extra safe-guards to ensure 
> that out-of-order {{CheckpointResources}} messages don't lead to accidental 
> data loss.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-2408) Slave should reclaim storage for destroyed persistent volumes.

Reply via email to