[ 
https://issues.apache.org/jira/browse/MESOS-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163417#comment-15163417
 ] 

Joseph Wu commented on MESOS-4677:
----------------------------------

My guess is this:
# The first {{usage = isolator.get()->usage(containerId);}} comes right after 
we isolate the test process, by writing to {{cgroup.procs}}.  Underneath, the 
cgroups API probably blocks the write from completing until the cgroups are 
updated.
# We do an {{os::close}} on a parent pipe to trigger the test process into 
{{exec}} ing.
# We immediately call {{usage = isolator.get()->usage(containerId);}} again.
# {{cgroups.procs}} doesn't change since {{exec}} doesn't change the PID.  But 
there may be a race between updating the "threads" ({{cgroup/tasks}}) and us 
reading the {{cgroup/tasks}}.

We can either:
* Import the {{cgroups.h}} header and use {{cgroups_lock}}/{{cgroups_unlock}} 
to synchronize.
* Add a sleep between closing the parent pipe and calling {{->usage(...)}}.
* Do some sort of operation on the test process (which would confirm that it is 
finished {{exec}} ing).  In this case we can write to the {{cat}} test process 
and read the echoed result.

> LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids is flaky.
> -----------------------------------------------------------
>
>                 Key: MESOS-4677
>                 URL: https://issues.apache.org/jira/browse/MESOS-4677
>             Project: Mesos
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.27
>            Reporter: Bernd Mathiske
>              Labels: flaky, test
>
> This test fails very often when run on CentOS 7, but may also fail elsewhere 
> sometimes. Unfortunately, it tends to only fail when --verbose is not set. 
> The output is this:
> {noformat}
> [21:45:21][Step 8/8] [ RUN      ] 
> LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids
> [21:45:21][Step 8/8] ../../src/tests/containerizer/isolator_tests.cpp:807: 
> Failure
> [21:45:21][Step 8/8] Value of: usage.get().threads()
> [21:45:21][Step 8/8]   Actual: 0
> [21:45:21][Step 8/8] Expected: 1U
> [21:45:21][Step 8/8] Which is: 1
> [21:45:21][Step 8/8] [  FAILED  ] 
> LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids (94 ms)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to