----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/9145/ -----------------------------------------------------------
(Updated Feb. 22, 2013, 12:32 a.m.) Review request for mesos, Benjamin Hindman and Vinod Kone. Changes ------- Benh review. Rebased off trunk. Description ------- This implements resource collection for the cgroups isolation module. >From the redhat documentation: https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpuacct.html // cpuacct.usage // reports the total CPU time (in nanoseconds) consumed by all tasks in this cgroup (including tasks lower in the hierarchy). I don't like this control because it can be reset back to zero! // cpuacct.stat // reports the user and system CPU time consumed by all tasks in this cgroup (including tasks lower in the hierarchy) in the following way: // user — CPU time consumed by tasks in user mode. // system — CPU time consumed by tasks in system (kernel) mode. // CPU time is reported in the units defined by the USER_HZ variable. Since USER_HZ is typically 100, the granularity here is only 10 ms. // cpuacct.usage_percpu // reports the CPU time (in nanoseconds) consumed on each CPU by all tasks in this cgroup (including tasks lower in the hierarchy). I don't like this control because it can be reset back to zero! I've used cpuacct.stat since AFAICT it can't be reset to 0. However cpuacct.stat has somewhat low granularity, see the testing comments below. This addresses bug MESOS-324. https://issues.apache.org/jira/browse/MESOS-324 Diffs (updated) ----- src/linux/cgroups.hpp 1f701f3bbbe06ddf84768c68b529aba4659c19be src/linux/cgroups.cpp e7bdb7442624ac9f77df6ab87de013f39de37d32 src/slave/cgroups_isolation_module.cpp a2eba6f96f5d8a4b1257571aa29e37c5682aab8d src/tests/cgroups_tests.cpp b219906374764e91f1a5268469ae92dd0fe08e53 Diff: https://reviews.apache.org/r/9145/diff/ Testing ------- Added tests for cgroups::stat. End to end testing using the webui. NOTES for cpuacct.stat: $ cat /cgroup/mesos/framework_201302132039-2081170186-5050-60933-0001_executor_default_tag_3e1f5310-c873-42cb-9aa4-4ee4c2b9feb8/cpuacct.usage 4672471833 --> 4672471833ns = 4.67 seconds $ cat /cgroup/mesos/framework_201302132039-2081170186-5050-60933-0001_executor_default_tag_3e1f5310-c873-42cb-9aa4-4ee4c2b9feb8/cpuacct.usage_percpu 831220060 463800214 319016010 184325849 840595741 441855678 294660045 160799890 240361561 197829862 130045719 56978804 227972655 193743493 98604097 70557562 --> 831220060+463800214+319016010+184325849+840595741+441855678+294660045+160799890+240361561+197829862+130045719+56978804+227972655+193743493+98604097+70557562 = 4752367240ns = 4.75 seconds $ cat /cgroup/mesos/framework_201302132039-2081170186-5050-60933-0001_executor_default_tag_3e1f5310-c873-42cb-9aa4-4ee4c2b9feb8/cpuacct.stat user 111 system 246 --> 1.11 + 2.46 = 3.57 seconds So since cpuacct.stat reveals only the user + system times, we see slightly lower times than where the total time is displayed. I'm guessing they may be including other cpu times? E.g. steal, guest I think user + system is a good measurement. Thanks, Ben Mahler
