[jira] [Comment Edited] (MESOS-7007) filesystem/shared and --default_container_info broken since 1.1

Pierre Cheynier (JIRA) Tue, 07 Feb 2017 05:24:26 -0800

    [ 
https://issues.apache.org/jira/browse/MESOS-7007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15855968#comment-15855968
 ]


Pierre Cheynier edited comment on MESOS-7007 at 2/7/17 1:23 PM:
----------------------------------------------------------------

Hi [~jieyu], [~gilbert],
I had a discussion on Friday with [~jieyu] about that issue.
Since, I did tests on 1.1.0 :
* {{--launcher=linux}} doesn't change anything. As seen with Jie Yu, I was 
already on this launcher, by default I guess.
* by removing {{filesystem/shared}} isolator, /tmp content is no more trashed 
on container creation/deletion BUT now the /tmp volume feature does not work 
anymore: 
  ** the tmp in the sandbox is {{root:root}} and {{0777}} and it is a pure bind 
mount, not something isolated - meaning if I erase here, it will erase on /tmp 
as well -
  ** I run into this issue: MESOS-6563, looking at the mounts visible from 
root: 
{noformat}
# There is only 1 task, so theoretically 1 mount
$ mesos-ps --master=127.0.0.1:5050
USER    FRAMEWORK    TASK      SLAVE              MEM                TIME       
       CPU (allocated)
mara... marathon     visibi... mesos-cluster-c... 13.7 MB/42.0 MB    
00:00:01.490000   0.2            
# But in fact, ... no !
$ mount | grep "mesos/slaves" | wc -l
56
# 56 is probably the number of container I launched for my CI tests
$ mount | grep "mesos/slaves" | head -5
/dev/sda3 on 
/var/opt/mesos/slaves/e02761a5-308e-4797-b43b-b56c3da66616-S0/frameworks/e02761a5-308e-4797-b43b-b56c3da66616-0000/executors/group_simplehttp.dcde69c5-ed32-11e6-b388-02427970a3a5/runs/45277613-6129-4eb3-b8d0-acc0c2fe8605/tmp
 type ext4 (rw,relatime,seclabel,data=ordered)
/dev/sda3 on 
/var/opt/mesos/slaves/e02761a5-308e-4797-b43b-b56c3da66616-S0/frameworks/e02761a5-308e-4797-b43b-b56c3da66616-0000/executors/group_simplehttp.dcde69c5-ed32-11e6-b388-02427970a3a5/runs/45277613-6129-4eb3-b8d0-acc0c2fe8605/tmp
 type ext4 (rw,relatime,seclabel,data=ordered)
/dev/sda3 on 
/var/opt/mesos/slaves/e02761a5-308e-4797-b43b-b56c3da66616-S0/frameworks/e02761a5-308e-4797-b43b-b56c3da66616-0000/executors/group_security.f6152faa-ed32-11e6-b388-02427970a3a5/runs/f74453b6-aa39-456f-a4a1-bd953b870d38/tmp
 type ext4 (rw,relatime,seclabel,data=ordered)
/dev/sda3 on 
/var/opt/mesos/slaves/e02761a5-308e-4797-b43b-b56c3da66616-S0/frameworks/e02761a5-308e-4797-b43b-b56c3da66616-0000/executors/group_simplehttp.dcde69c5-ed32-11e6-b388-02427970a3a5/runs/45277613-6129-4eb3-b8d0-acc0c2fe8605/tmp
 type ext4 (rw,relatime,seclabel,data=ordered)
/dev/sda3 on 
/var/opt/mesos/slaves/e02761a5-308e-4797-b43b-b56c3da66616-S0/frameworks/e02761a5-308e-4797-b43b-b56c3da66616-0000/executors/group_security.f6152faa-ed32-11e6-b388-02427970a3a5/runs/f74453b6-aa39-456f-a4a1-bd953b870d38/tmp
 type ext4 (rw,relatime,seclabel,data=ordered)
{noformat}


was (Author: pierrecdn):
Hi [~jieyu], [~gilbert],
I had a discussion on Friday with [~jieyu] about that issue.
Since, I did tests on 1.1.0 :
* {{--launcher=linux}} doesn't change anything. As seen with Jie Yu, I was 
already on this launcher, by default I guess.
* by removing filesystem/shared, /tmp content is no more trashed on container 
creation/deletion BUT now the /tmp volume feature does not work anymore: 
  ** the tmp in the sandbox is {{root:root}} and {{0777}} and it is a pure bind 
mount, not something isolated - meaning if I erase here, it will erase on /tmp 
as well-
  ** I run into this issue: https://issues.apache.org/jira/browse/MESOS-6563, 
looking at the mounts visible from root: 
{noformat}
# There is only 1 task, so theoretically 1 mount
$ mesos-ps --master=127.0.0.1:5050
USER    FRAMEWORK    TASK      SLAVE              MEM                TIME       
       CPU (allocated)
mara... marathon     visibi... mesos-cluster-c... 13.7 MB/42.0 MB    
00:00:01.490000   0.2            
# But in fact, ... no !
$ mount | grep "mesos/slaves" | wc -l
56
# 56 is probably the number of container I launched for my CI tests
$ mount | grep "mesos/slaves" | head -5
/dev/sda3 on 
/var/opt/mesos/slaves/e02761a5-308e-4797-b43b-b56c3da66616-S0/frameworks/e02761a5-308e-4797-b43b-b56c3da66616-0000/executors/group_simplehttp.dcde69c5-ed32-11e6-b388-02427970a3a5/runs/45277613-6129-4eb3-b8d0-acc0c2fe8605/tmp
 type ext4 (rw,relatime,seclabel,data=ordered)
/dev/sda3 on 
/var/opt/mesos/slaves/e02761a5-308e-4797-b43b-b56c3da66616-S0/frameworks/e02761a5-308e-4797-b43b-b56c3da66616-0000/executors/group_simplehttp.dcde69c5-ed32-11e6-b388-02427970a3a5/runs/45277613-6129-4eb3-b8d0-acc0c2fe8605/tmp
 type ext4 (rw,relatime,seclabel,data=ordered)
/dev/sda3 on 
/var/opt/mesos/slaves/e02761a5-308e-4797-b43b-b56c3da66616-S0/frameworks/e02761a5-308e-4797-b43b-b56c3da66616-0000/executors/group_security.f6152faa-ed32-11e6-b388-02427970a3a5/runs/f74453b6-aa39-456f-a4a1-bd953b870d38/tmp
 type ext4 (rw,relatime,seclabel,data=ordered)
/dev/sda3 on 
/var/opt/mesos/slaves/e02761a5-308e-4797-b43b-b56c3da66616-S0/frameworks/e02761a5-308e-4797-b43b-b56c3da66616-0000/executors/group_simplehttp.dcde69c5-ed32-11e6-b388-02427970a3a5/runs/45277613-6129-4eb3-b8d0-acc0c2fe8605/tmp
 type ext4 (rw,relatime,seclabel,data=ordered)
/dev/sda3 on 
/var/opt/mesos/slaves/e02761a5-308e-4797-b43b-b56c3da66616-S0/frameworks/e02761a5-308e-4797-b43b-b56c3da66616-0000/executors/group_security.f6152faa-ed32-11e6-b388-02427970a3a5/runs/f74453b6-aa39-456f-a4a1-bd953b870d38/tmp
 type ext4 (rw,relatime,seclabel,data=ordered)
{noformat}

What's the plan ? 

> filesystem/shared and --default_container_info broken since 1.1
> ---------------------------------------------------------------
>
>                 Key: MESOS-7007
>                 URL: https://issues.apache.org/jira/browse/MESOS-7007
>             Project: Mesos
>          Issue Type: Bug
>          Components: agent
>    Affects Versions: 1.1.0
>            Reporter: Pierre Cheynier
>
> I face this issue, that prevent me to upgrade to 1.1.0 (and the change was 
> consequently introduced in this version):
> I'm using default_container_info to mount a /tmp volume in the container's 
> mount namespace from its current sandbox, meaning that each container have a 
> dedicated /tmp, thanks to the {{filesystem/shared}} isolator.
> I noticed through our automation pipeline that integration tests were failing 
> and found that this is because /tmp (the one from the host!) contents is 
> trashed each time a container is created.
> Here is my setup: 
> * 
> {{--isolation='cgroups/cpu,cgroups/mem,namespaces/pid,*disk/du,filesystem/shared,filesystem/linux*,docker/runtime'}}
> * 
> {{--default_container_info='\{"type":"MESOS","volumes":\[\{"host_path":"tmp","container_path":"/tmp","mode":"RW"\}\]\}'}}
> I discovered this issue in the early days of 1.1 (end of Nov, spoke with 
> someone on Slack), but had unfortunately no time to dig into the symptoms a 
> bit more.
> I found nothing interesting even using GLOGv=3.
> Maybe it's a bad usage of isolators that trigger this issue ? If it's the 
> case, then at least a documentation update should be done.
> Let me know if more information is needed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (MESOS-7007) filesystem/shared and --default_container_info broken since 1.1

Reply via email to