[
https://issues.apache.org/jira/browse/MESOS-7366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhitao Li updated MESOS-7366:
-----------------------------
Description:
When 1) a persistent volume is mounted, 2) umount is stuck or something, 3)
executor directory gc being invoked, agent seems to emit a log like:
```
Failed to delete directory <executor_dir>/runs/<uuid>/volume: Device or
resource busy
```
After this, the persistent volume directory is empty.
This could trigger data loss on critical workload so we should fix this ASAP.
The triggering environment is a custom executor w/o rootfs image.
Please let me know if you need more signal.
was:
When 1) a persistent volume is mounted, 2) umount is stuck or something, 3)
executor directory gc being invoked, agent seems to emit a log like:
```
Failed to delete directory <executor_dir>/runs/<uuid>/volume: Device or
resource busy
```
The triggering environment is a custom executor w/o rootfs image.
Please let me know if you need more signal.
> Incorrect agent gc could empty up entire persistent volume content
> ------------------------------------------------------------------
>
> Key: MESOS-7366
> URL: https://issues.apache.org/jira/browse/MESOS-7366
> Project: Mesos
> Issue Type: Bug
> Reporter: Zhitao Li
> Assignee: Jie Yu
> Priority: Critical
>
> When 1) a persistent volume is mounted, 2) umount is stuck or something, 3)
> executor directory gc being invoked, agent seems to emit a log like:
> ```
> Failed to delete directory <executor_dir>/runs/<uuid>/volume: Device or
> resource busy
> ```
> After this, the persistent volume directory is empty.
> This could trigger data loss on critical workload so we should fix this ASAP.
> The triggering environment is a custom executor w/o rootfs image.
> Please let me know if you need more signal.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)