Hi, I am on CentOS 7 with kernel package 3.10.0-862.2.3.el7.x86_64.
I was looking at the scheduling info of my application by cat /proc/[pid]/sched: ------------------------------------------------------------------- se.exec_start : 78998944.120048 se.vruntime : 78337609.962134 se.sum_exec_runtime : 78337613.040860 se.nr_migrations : 6 nr_switches : 41 nr_voluntary_switches : 31 nr_involuntary_switches : 10 se.load.weight : 1024 policy : 0 prio : 120 clock-delta : 13 mm->numa_scan_seq : 925 numa_migrations, 0 numa_faults_memory, 0, 0, 0, 0, 1 numa_faults_memory, 1, 0, 0, 0, 1 numa_faults_memory, 0, 1, 1, 1, 0 numa_faults_memory, 1, 1, 0, 1, 9 And I am trying to understand the numbers after numa_faults_memory. I dived into the source code: https://elixir.bootlin.com/linux/v3.18.75/source/kernel/sched/debug.c I believe line 539 prints the last 4 lines. Given my machine has 2 NUMA nodes, and the outer loop will loop through the NUMA nodes and the inner loop is hardcoded to loop twice, the # of lines in the output match. So my questions are: 1. What the inner loop is trying to do? Why looping twice? 2. What is the last number in numa_faults_memory (e.g. 9 for last line)? 3. When will this counter be reset? According to the comment at https://elixir.bootlin.com/linux/v3.18.75/source/include/linux/sched.h#L1577, "Exponential decaying average of faults on a per-node basis. Scheduling placement decisions are made based on the these counts. The values remain static for the duration of a PTE scan", it sounds like numa_faults_memory is reset or recompute for each PTE scan. What's PTE scan? What does the "scheduling" refer to? Like scheduling the migration of a chunk of memory from one NUMA node to another due to NUMA balancing? If yes, does it mean if I turn off NUMA balancing("echo 0 > /proc/sys/kernel/numa_balancing"), PTE scan will stop and numa_faults_memory will remain 0 all the time (assuming there is sufficient memory all the time)? 4. And for my machine task_struct::numa_faults_memory[] would have 4 elements. Then, I am lost what it is tracking. As I thought it is tracking the number of numa faults per NUMA node. And the number of elements is always # of NUMA node * 2. Thanks!
_______________________________________________ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies