Hi Michael > Task competition inside a cgroup won't be considered as cgroup's > competition, please try create another cgroup with dead loop on > each CPU
Yes, you are right, but I don't think we just need to account for cgroup's competition, because this factor does not reflect cgroup internal conditions. We still need a proper method to evaluate CPU competition inside a cgroup. > Running tasks doesn't means no competition, only if that cgroup occupied > the CPU exclusively at that moment. I care much about CPU competiton inside a cgroup. I can only read '/proc/$pid/schedstat' thousands of times to get every task's wait_sum time without cgroup hierarchy wait_sum, and it definitely tasks a real long time(40ms for 8000 tasks in a container). > No offense but I'm afraid you misunderstand the problem we try to solve > by wait_sum, if your purpose is to have a way to tell whether there are > sufficient CPU inside a container, please try lxcfs + top, if there are > almost no idle and load is high, then the CPU resource is not sufficient. emmmm... Maybe I didn't make it clear. We need to dynamically adjust the number of CPUs for a container based on the running state of tasks inside the container. If we find tasks' wait_sum are really high, we will add more CPU cores to this container, or else we will decline some CPU to this container. In a word, we want to ensure 'co-scheduling' for high priority containers. >Frankly speaking this sounds like a supplement rather than a missing piece, >although we don't rely on lxcfs and modify the kernel ourselves to support >container environment, I still don't think such kind of solutions should be >in kernel. I don't care if this value is considered as a supplement or a missing piece. I only care about how can I assess the running state inside a container. I think lxcfs is really a good solution to improve the visibility of container resources, but it is not good enough at the moment. /proc/cpuinfo /proc/diskstats /proc/meminfo /proc/stat /proc/swaps /proc/uptime we can read this procfs file inside a container,but this file still cannot reflect real-time information. Please think about the following scenario: a 'rabbit' process will generate 2000 tasks in every 30ms, and these children tasks just run 1~5ms and then exit. How can we detect this thrashing workload without hierarchy wait_sum? Thanks, Yuzhoujian