[
https://issues.apache.org/jira/browse/MESOS-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163417#comment-15163417
]
Joseph Wu commented on MESOS-4677:
----------------------------------
My guess is this:
# The first {{usage = isolator.get()->usage(containerId);}} comes right after
we isolate the test process, by writing to {{cgroup.procs}}. Underneath, the
cgroups API probably blocks the write from completing until the cgroups are
updated.
# We do an {{os::close}} on a parent pipe to trigger the test process into
{{exec}} ing.
# We immediately call {{usage = isolator.get()->usage(containerId);}} again.
# {{cgroups.procs}} doesn't change since {{exec}} doesn't change the PID. But
there may be a race between updating the "threads" ({{cgroup/tasks}}) and us
reading the {{cgroup/tasks}}.
We can either:
* Import the {{cgroups.h}} header and use {{cgroups_lock}}/{{cgroups_unlock}}
to synchronize.
* Add a sleep between closing the parent pipe and calling {{->usage(...)}}.
* Do some sort of operation on the test process (which would confirm that it is
finished {{exec}} ing). In this case we can write to the {{cat}} test process
and read the echoed result.
> LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids is flaky.
> -----------------------------------------------------------
>
> Key: MESOS-4677
> URL: https://issues.apache.org/jira/browse/MESOS-4677
> Project: Mesos
> Issue Type: Bug
> Components: test
> Affects Versions: 0.27
> Reporter: Bernd Mathiske
> Labels: flaky, test
>
> This test fails very often when run on CentOS 7, but may also fail elsewhere
> sometimes. Unfortunately, it tends to only fail when --verbose is not set.
> The output is this:
> {noformat}
> [21:45:21][Step 8/8] [ RUN ]
> LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids
> [21:45:21][Step 8/8] ../../src/tests/containerizer/isolator_tests.cpp:807:
> Failure
> [21:45:21][Step 8/8] Value of: usage.get().threads()
> [21:45:21][Step 8/8] Actual: 0
> [21:45:21][Step 8/8] Expected: 1U
> [21:45:21][Step 8/8] Which is: 1
> [21:45:21][Step 8/8] [ FAILED ]
> LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids (94 ms)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)