[ 
https://issues.apache.org/jira/browse/MESOS-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738557#comment-14738557
 ] 

haosdent edited comment on MESOS-3349 at 9/11/15 9:30 AM:
----------------------------------------------------------

According 
[CLONE_NEWNS|http://stackoverflow.com/questions/22889241/linux-understanding-the-mount-namespace-clone-clone-newns-flag],
 [bind_mount|https://lwn.net/Articles/159092/]. I think could explain the 
behaviours so far.

In LinuxFilesystemIsolatorProcess, we mount persistent volume (default 
behaviour make-private) before launch the executor. After LinuxLauncher fork 
with CLONE_NEWNS, we could umount persistent volume in 
LinuxFilesystemIsolatorProcess. But this don't affect the executor continue to 
hold that mount point. When slave receive TASK_FINISH and call 
LinuxFilesystemIsolatorProcess try to rmdir that mount point, it would failed 
because executor is still running and holding the mount point (after add some 
trace code to show when executor exited, I observed this.). So a possible way 
to fix this is to use make-shared or make-slave when mount persistent volume. 
By my attempts failed on this. 

{code}
45 22 8:3 
/tmp/PersistentVolumeTest_AccessPersistentVolume_L6fY1a/volumes/roles/role1/id1 
/tmp/PersistentVolumeTest_AccessPersistentVolume_L6fY1a/slaves/20150911-170559-162297291-49192-21628-S0/frameworks/20150911-170559-162297291-49192-21628-0000/executors/c6bcf76f-7cf5-42e6-8eb8-2d21e393ba3d/runs/454bbfa3-0305-4900-b05d-389f6b215c32/path1
 rw,relatime shared:1 - ext4 
/dev/disk/by-uuid/98708f21-a59d-4b80-a85c-27b78c22e316 
rw,errors=remount-ro,data=ordered
{code}

{code}
78 48 8:3 
/tmp/PersistentVolumeTest_AccessPersistentVolume_L6fY1a/volumes/roles/role1/id1 
/tmp/PersistentVolumeTest_AccessPersistentVolume_L6fY1a/slaves/20150911-170559-162297291-49192-21628-S0/frameworks/20150911-170559-162297291-49192-21628-0000/executors/c6bcf76f-7cf5-42e6-8eb8-2d21e393ba3d/runs/454bbfa3-0305-4900-b05d-389f6b215c32/path1
 rw,relatime shared:1 - ext4 
/dev/disk/by-uuid/98708f21-a59d-4b80-a85c-27b78c22e316 
rw,errors=remount-ro,data=ordered
{code}

Could see the persistent volumes have already mount as shared, but this test 
still failed.


was (Author: [email protected]):
After see this 
http://stackoverflow.com/questions/22889241/linux-understanding-the-mount-namespace-clone-clone-newns-flag
 about CLONE_NEWNS. I think could explain the behaviours so far.

In LinuxFilesystemIsolatorProcess, we mount in parent (pid is 24073).
{code}
I0910 18:07:42.768034 24073 linux.cpp:598] Mounting 
'/tmp/PersistentVolumeTest_AccessPersistentVolume_PTx7g0/volumes/roles/role1/id1'
 to 
'/tmp/PersistentVolumeTest_AccessPersistentVolume_PTx7g0/slaves/20150910-180742-162297291-42795-24055-S0/frameworks/20150910-180742-162297291-42795-24055-0000/executors/72989615-cc6e-449c-a561-264fcee7edc3/runs/0cdc0d01-4c59-48e8-925a-7a6c06feb2ae/path1'
 for persistent volume disk(role1)[id1:path1]:64 of container 
0cdc0d01-4c59-48e8-925a-7a6c06feb2ae
{code}

After LinuxLauncher fork with CLONE_NEWNS, child(pid is 24071) could unmount 
it. But still could not rmdir it, because it has another mount point handled by 
parent.
{code}
I0910 18:07:44.868654 24071 linux.cpp:493] Removing mount 
'/tmp/PersistentVolumeTest_AccessPersistentVolume_PTx7g0/slaves/20150910-180742-162297291-42795-24055-S0/frameworks/20150910-180742-162297291-42795-24055-0000/executors/72989615-cc6e-449c-a561-264fcee7edc3/runs/0cdc0d01-4c59-48e8-925a-7a6c06feb2ae/path1'
 for persistent volume disk(role1)[id1:path1]:64 of container 
0cdc0d01-4c59-48e8-925a-7a6c06feb2ae
E0910 18:07:44.876619 24076 slave.cpp:2870] Failed to update resources for 
container 0cdc0d01-4c59-48e8-925a-7a6c06feb2ae of executor 
72989615-cc6e-449c-a561-264fcee7edc3 running task 
72989615-cc6e-449c-a561-264fcee7edc3 on status update for terminal task, 
destroying container: Collect failed: Failed to remove persistent volume mount 
point at 
'/tmp/PersistentVolumeTest_AccessPersistentVolume_PTx7g0/slaves/20150910-180742-162297291-42795-24055-S0/frameworks/20150910-180742-162297291-42795-24055-0000/executors/72989615-cc6e-449c-a561-264fcee7edc3/runs/0cdc0d01-4c59-48e8-925a-7a6c06feb2ae/path1':
 Device or resource busy
{code}


> PersistentVolumeTest.AccessPersistentVolume fails when run as root.
> -------------------------------------------------------------------
>
>                 Key: MESOS-3349
>                 URL: https://issues.apache.org/jira/browse/MESOS-3349
>             Project: Mesos
>          Issue Type: Bug
>          Components: test
>         Environment: Ubuntu 14.04, CentOS 5
>            Reporter: Benjamin Mahler
>            Assignee: haosdent
>              Labels: flaky-test
>
> When running the tests as root:
> {noformat}
> [ RUN      ] PersistentVolumeTest.AccessPersistentVolume
> I0901 02:17:26.435140 39432 exec.cpp:133] Version: 0.25.0
> I0901 02:17:26.442129 39461 exec.cpp:207] Executor registered on slave 
> 20150901-021726-1828659978-52102-32604-S0
> Registered executor on hostname
> Starting task d8ff1f00-e720-4a61-b440-e111009dfdc3
> sh -c 'echo abc > path1/file'
> Forked command at 39484
> Command exited with status 0 (pid: 39484)
> ../../src/tests/persistent_volume_tests.cpp:579: Failure
> Value of: os::exists(path::join(directory, "path1"))
>   Actual: true
> Expected: false
> [  FAILED  ] PersistentVolumeTest.AccessPersistentVolume (777 ms)
> {noformat}
> FYI [~jieyu] [~mcypark]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to