[ 
https://issues.apache.org/jira/browse/MESOS-9319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16650806#comment-16650806
 ] 

James Peach edited comment on MESOS-9319 at 10/15/18 9:18 PM:
--------------------------------------------------------------

When using a custom user namespace isolator, the task fails at launch because 
opening devices fails with a {{EPERM}} error. This problem is described in 
[this system issue|https://github.com/systemd/systemd/pull/9483] and this [lxd 
issue|https://github.com/lxc/lxd/issues/4950].

The problem arises in the Mesos containerizer due to the order of operations:

# Clone the containerizer with CLONE_NEWNS
# Mount a tmpfs for the devices
# mknod for the various device nodes

Referring back to the lxc issue, because we do (1) before (2), the tmpfs on 
/dev is marked SB_I_NODEV. Due to the new 4.18 behavior, the mkdir in (3) now 
succeeds (see commit 
[55956b59df33|https://github.com/torvalds/linux/commit/55956b59df336f6738da916dbb520b6e37df9fbd]).
 Previously it would fail and we would fall back to bind mounting the device. 
However, even though we created the device, we can't actually open it due to 
the SB_I_NODEV flag on the tmpfs mount. It appears that the purpose of allowing 
mknod is to that containers can create overlayfs whiteouts.

One approach to deal with this in the Mesos containerizer is to complete the 
device node cleanup that was begun in with the linux/devices isolator. This 
approach involves moving all the responsibility for creating devices back to 
the isolators. Then, at containerization time, we simply bind-mount the whole 
of /dev from the per-container staging area. Since the isolators create the 
devices in the host namespace and on the Mesos work directory, none of the 
conditions that trigger the failure would be invoked.

The failure we observed with our tasks was a failure to open {{/dev/null}}, 
when redirecting it as standard input to a child process.


was (Author: jamespeach):
When using a custom user namespace isolator, the task fails at launch because 
opening devices fails with a {{EPERM}} error. This problem is described in 
[this system issue|https://github.com/systemd/systemd/pull/9483] and this [lxd 
issue|https://github.com/lxc/lxd/issues/4950].

The problem arises in the Mesos containerizer due to the order of operations:

# Clone the containerizer with CLONE_NEWNS
# Mount a tmpfs for the devices
# mknod for the various device nodes

Referring back to the lxc issue, because we do (1) before (2), the tmpfs on 
/dev is marked SB_I_NODEV. Due to the new 4.18 behavior, the mkdir in (3) now 
succeeds (see commit 
[55956b59df33|https://github.com/torvalds/linux/commit/55956b59df336f6738da916dbb520b6e37df9fbd]).
 Previously it would fail and we would fall back to bind mounting the device. 
However, even though we created the device, we can't actually open it due to 
the SB_I_NODEV flag on the tmpfs mount. It appears that the purpose of allowing 
mknod is to that containers can create overlayfs whiteouts.

One approach to deal with this in the Mesos containerizer is to complete the 
device node cleanup that was begun in with the linux/devices isolator. This 
approach involves moving all the responsibility for creating devices back to 
the isolators. Then, at containerization time, we simply bind-mount the whole 
of /dev from the per-container staging area. Since the isolators create the 
devices in the host namespace and on the Mesos work directory, none of the 
conditions that trigger the failure would be invoked.


> Create all container devices at isolation time
> ----------------------------------------------
>
>                 Key: MESOS-9319
>                 URL: https://issues.apache.org/jira/browse/MESOS-9319
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization
>            Reporter: James Peach
>            Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to