[
https://issues.apache.org/jira/browse/MESOS-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Neil Conway updated MESOS-2408:
-------------------------------
Description:
At present, destroying a persistent volume does not cleanup any filesystem
space that was used by the volume (it just removes the Mesos-level metadata
about the volume). This effectively leads to a storage leak, which is bad. For
task sandboxes, we do "garbage collection" to remove the sandbox at a later
time to facilitate debugging failed tasks; for volumes, because they are
explicitly deleted and are not tied to the lifecycle of a task, removing the
associated storage immediately seems best.
To implement this safely, we'll either need to ensure that libprocess messages
are delivered in-order, or else add some extra safe-guards to ensure that
out-of-order {{CheckpointResources}} messages don't lead to accidental data
loss.
was:This is tricky in the case when a persistence id is re-used. When a
persistent volume is destroyed explicitly by the framework, master deletes all
information about this volume. That mean the master no longer has the ability
to check if the persistence id is re-used (and reject the later attempt). On
the slave side, we'll use some GC policy to remove directories associated with
deleted persistent volumes (similar to how we GC sandboxes). That means the
persistent volume directory won't be deleted immediately when the volume is
destroyed by the framework explicitly. When the same persistence id is reused,
we'll see the persistent volume still exists and we need to cancel the GC of
that directory (similar to what we cancel the GC for meta directories during
runTask).
> Slave should reclaim storage for destroyed persistent volumes.
> --------------------------------------------------------------
>
> Key: MESOS-2408
> URL: https://issues.apache.org/jira/browse/MESOS-2408
> Project: Mesos
> Issue Type: Task
> Components: slave
> Reporter: Jie Yu
> Assignee: Neil Conway
> Labels: mesosphere, persistent-volumes
>
> At present, destroying a persistent volume does not cleanup any filesystem
> space that was used by the volume (it just removes the Mesos-level metadata
> about the volume). This effectively leads to a storage leak, which is bad.
> For task sandboxes, we do "garbage collection" to remove the sandbox at a
> later time to facilitate debugging failed tasks; for volumes, because they
> are explicitly deleted and are not tied to the lifecycle of a task, removing
> the associated storage immediately seems best.
> To implement this safely, we'll either need to ensure that libprocess
> messages are delivered in-order, or else add some extra safe-guards to ensure
> that out-of-order {{CheckpointResources}} messages don't lead to accidental
> data loss.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)