-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9145/
-----------------------------------------------------------

(Updated Feb. 25, 2013, 7:17 p.m.)


Review request for mesos, Benjamin Hindman and Vinod Kone.


Changes
-------

Rebased off trunk.


Description
-------

This implements resource collection for the cgroups isolation module.

>From the redhat documentation:
https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpuacct.html

// cpuacct.usage
// reports the total CPU time (in nanoseconds) consumed by all tasks in this 
cgroup (including tasks lower in the hierarchy).
I don't like this control because it can be reset back to zero!

// cpuacct.stat
// reports the user and system CPU time consumed by all tasks in this cgroup 
(including tasks lower in the hierarchy) in the following way:
// user — CPU time consumed by tasks in user mode.
// system — CPU time consumed by tasks in system (kernel) mode.
// CPU time is reported in the units defined by the USER_HZ variable.
Since USER_HZ is typically 100, the granularity here is only 10 ms.

// cpuacct.usage_percpu
// reports the CPU time (in nanoseconds) consumed on each CPU by all tasks in 
this cgroup (including tasks lower in the hierarchy).
I don't like this control because it can be reset back to zero!

I've used cpuacct.stat since AFAICT it can't be reset to 0.
However cpuacct.stat has somewhat low granularity, see the testing comments 
below.


This addresses bug MESOS-324.
    https://issues.apache.org/jira/browse/MESOS-324


Diffs (updated)
-----

  src/linux/cgroups.hpp 1f701f3bbbe06ddf84768c68b529aba4659c19be 
  src/linux/cgroups.cpp f3823622631363a035f6c552344fa704e00ab255 
  src/slave/cgroups_isolation_module.cpp 
a2eba6f96f5d8a4b1257571aa29e37c5682aab8d 
  src/tests/cgroups_tests.cpp b219906374764e91f1a5268469ae92dd0fe08e53 

Diff: https://reviews.apache.org/r/9145/diff/


Testing
-------

Added tests for cgroups::stat.

End to end testing using the webui.

NOTES for cpuacct.stat:
$ cat 
/cgroup/mesos/framework_201302132039-2081170186-5050-60933-0001_executor_default_tag_3e1f5310-c873-42cb-9aa4-4ee4c2b9feb8/cpuacct.usage
4672471833
--> 4672471833ns = 4.67 seconds

$ cat 
/cgroup/mesos/framework_201302132039-2081170186-5050-60933-0001_executor_default_tag_3e1f5310-c873-42cb-9aa4-4ee4c2b9feb8/cpuacct.usage_percpu
 
831220060 463800214 319016010 184325849 840595741 441855678 294660045 160799890 
240361561 197829862 130045719 56978804 227972655 193743493 98604097 70557562 
--> 
831220060+463800214+319016010+184325849+840595741+441855678+294660045+160799890+240361561+197829862+130045719+56978804+227972655+193743493+98604097+70557562
 = 4752367240ns = 4.75 seconds

$ cat 
/cgroup/mesos/framework_201302132039-2081170186-5050-60933-0001_executor_default_tag_3e1f5310-c873-42cb-9aa4-4ee4c2b9feb8/cpuacct.stat
user 111
system 246
--> 1.11 + 2.46 = 3.57 seconds

So since cpuacct.stat reveals only the user + system times, we see slightly 
lower times than where the total time is displayed. I'm guessing they may be 
including other cpu times?
E.g. steal, guest

I think user + system is a good measurement.


Thanks,

Ben Mahler

Reply via email to