[
https://issues.apache.org/jira/browse/MESOS-6162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16174787#comment-16174787
]
Qian Zhang commented on MESOS-6162:
-----------------------------------
I did more tests for this performance issue with Mesos (rather than just
manually tested it with {{dd}} in my previous post), I used {{mesos-execute}}
to launch task to run {{dd}} like this:
{code}mesos-execute --master=192.168.1.6:5050 --name=test --command="dd
if=/dev/zero of=test.bin bs=512 count=1000 oflag=dsync"{code}
And I found this performance issue will *always* happen as long as the
combination {{ext4/ext3 with the data=ordered option}} + {{cfq IO scheduler}}
is met *no matter `cgroups/blkio` isolation is enabled or not*, i.e., if that
combination is met, the task will always take much longer to complete (~16s)
than what the task will take (~1.2s) if that combination is not met regardless
`cgroups/blkio` enabled or not.
So it seems this performance issue has nothing to do with `cgroups/blkio` since
it will happen even `cgroups/blkio` is not enabled at all. However a weird
issue I found is, if the process is assigned to the *root* blkio cgroup and
even that combination is met, this performance issue will *not* happen:
{code}
# echo $$ > /sys/fs/cgroup/blkio/cgroup.procs
# dd if=/dev/zero of=test.bin bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 1.19546 s, 428 kB/s <--- No
performance issue.
{code}
So the conclusion is when the combination is met,
# If the process is not assigned to any blkio cgroups (i.e., `cgroups/blio`
isolation is not enabled), the performance issue will happen.
# If the process is assigned to a sub blkio cgroup (i.e., `cgroups/blio`
isolation is enabled), the performance issue will happen.
# If the process is assigned to the root blkio cgroup, the performance issue
will not happen.
I think 1 and 2 will happen in the Mesos context but not 3 since a container
launched by Mesos will never be assigned to the root blkio cgroup. Originally I
thought we should add a note for the performance issue in the doc of
`cgroups/blkio`, but now I think that may not be the right place to mention
such performance issue, instead we should add such note in the doc
{{mesos-containerizer.md}} and {{persistent-volume.md}}.
> Add support for cgroups blkio subsystem blkio statistics.
> ---------------------------------------------------------
>
> Key: MESOS-6162
> URL: https://issues.apache.org/jira/browse/MESOS-6162
> Project: Mesos
> Issue Type: Task
> Components: cgroups, containerization
> Reporter: haosdent
> Assignee: Jason Lai
> Labels: cgroups, containerizer, mesosphere
> Fix For: 1.4.0
>
>
> Noted that cgroups blkio subsystem may have performance issue, refer to
> https://github.com/opencontainers/runc/issues/861
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)