-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9145/
-----------------------------------------------------------

(Updated Feb. 13, 2013, 9:48 p.m.)


Review request for mesos, Benjamin Hindman and Vinod Kone.


Description (updated)
-------

This implements resource collection for the cgroups isolation module.

>From the redhat documentation:
https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpuacct.html

// cpuacct.usage
// reports the total CPU time (in nanoseconds) consumed by all tasks in this 
cgroup (including tasks lower in the hierarchy).
I don't like this control because it can be reset back to zero!

// cpuacct.stat
// reports the user and system CPU time consumed by all tasks in this cgroup 
(including tasks lower in the hierarchy) in the following way:
// user — CPU time consumed by tasks in user mode.
// system — CPU time consumed by tasks in system (kernel) mode.
// CPU time is reported in the units defined by the USER_HZ variable.
Since USER_HZ is typically 100, the granularity here is only 10 ms.

// cpuacct.usage_percpu
// reports the CPU time (in nanoseconds) consumed on each CPU by all tasks in 
this cgroup (including tasks lower in the hierarchy).
I don't like this control because it can be reset back to zero!

I've used cpuacct.stat since AFAICT it can't be reset to 0.
However cpuacct.stat has somewhat low granularity, see the testing comments 
below.


This addresses bug MESOS-324.
    https://issues.apache.org/jira/browse/MESOS-324


Diffs
-----

  src/linux/cgroups.hpp 1f701f3bbbe06ddf84768c68b529aba4659c19be 
  src/linux/cgroups.cpp 03b31e7309b9dd65f00d3b0da2abb81ddaaeea43 
  src/slave/cgroups_isolation_module.cpp 
63cefc33cf34eebb82db5d8448b751be8652fa36 
  src/tests/cgroups_tests.cpp b219906374764e91f1a5268469ae92dd0fe08e53 

Diff: https://reviews.apache.org/r/9145/diff/


Testing (updated)
-------

Added tests for cgroups::stat.

End to end testing using the webui.

NOTES for cpuacct.stat:
$ cat 
/cgroup/mesos/framework_201302132039-2081170186-5050-60933-0001_executor_default_tag_3e1f5310-c873-42cb-9aa4-4ee4c2b9feb8/cpuacct.usage
4672471833
--> 4672471833ns = 4.67 seconds

$ cat 
/cgroup/mesos/framework_201302132039-2081170186-5050-60933-0001_executor_default_tag_3e1f5310-c873-42cb-9aa4-4ee4c2b9feb8/cpuacct.usage_percpu
 
831220060 463800214 319016010 184325849 840595741 441855678 294660045 160799890 
240361561 197829862 130045719 56978804 227972655 193743493 98604097 70557562 
--> 
831220060+463800214+319016010+184325849+840595741+441855678+294660045+160799890+240361561+197829862+130045719+56978804+227972655+193743493+98604097+70557562
 = 4752367240ns = 4.75 seconds

$ cat 
/cgroup/mesos/framework_201302132039-2081170186-5050-60933-0001_executor_default_tag_3e1f5310-c873-42cb-9aa4-4ee4c2b9feb8/cpuacct.stat
user 111
system 246
--> 1.11 + 2.46 = 3.57 seconds

So since cpuacct.stat reveals only the user + system times, we see slightly 
lower times than where the total time is displayed. I'm guessing they may be 
including other cpu times?
E.g. steal, guest

I think user + system is a good measurement.


Thanks,

Ben Mahler

Reply via email to