** Description changed: - On scobee-kernel(arm64) with hirsute:linux(5.11.0-41.45) for - sru-20211108 there are several reports about the sched domain not - covering the full range. The same does not happen on kuzzle. But 32 is a - bit of a suspicious number. + [Impact] + The LTP cpuset_sched_domains test, authored by Miao Xie, fails on a Kunpeng920 + server that has 4 NUMA nodes: + https://launchpad.net/bugs/1951289 - Running tests....... - cpuset_sched_domains 1 TINFO: CPUs are numbered continuously starting at 0 (0-127) - cpuset_sched_domains 1 TINFO: Nodes are numbered continuously starting at 0 (0-3) - cpuset_sched_domains 1 TINFO: root group load balance test - cpuset_sched_domains 1 TINFO: sched load balance: 0 - cpuset_sched_domains 1 TINFO: CPU hotplug: - cpuset_check_domains 1 TPASS : check_sched_domains passed - cpuset_sched_domains 1 TPASS: partition sched domains succeeded. - cpuset_sched_domains 3 TINFO: root group load balance test - cpuset_sched_domains 3 TINFO: sched load balance: 1 - cpuset_sched_domains 3 TINFO: CPU hotplug: - cpuset_check_domains 1 TFAIL : cpuset_sched_domains_check.c:110: cpu32's sched domain is wrong(Domain: 0-127, CPU's Sched Domain: 0-95). - cpuset_sched_domains 3 TFAIL: partition sched domains failed. - cpuset_sched_domains 5 TINFO: root group load balance test - cpuset_sched_domains 5 TINFO: sched load balance: 0 - cpuset_sched_domains 5 TINFO: CPU hotplug: - cpuset_check_domains 1 TPASS : check_sched_domains passed - cpuset_sched_domains 5 TPASS: partition sched domains succeeded. - cpuset_sched_domains 7 TINFO: root group load balance test - cpuset_sched_domains 7 TINFO: sched load balance: 0 - cpuset_sched_domains 7 TINFO: CPU hotplug: - cpuset_check_domains 1 TPASS : check_sched_domains passed - cpuset_sched_domains 7 TPASS: partition sched domains succeeded. - cpuset_sched_domains 9 TINFO: root group load balance test - cpuset_sched_domains 9 TINFO: sched load balance: 1 - cpuset_sched_domains 9 TINFO: CPU hotplug: - cpuset_check_domains 1 TFAIL : cpuset_sched_domains_check.c:110: cpu32's sched domain is wrong(Domain: 0-127, CPU's Sched Domain: 0-95). - cpuset_sched_domains 9 TFAIL: partition sched domains failed. - cpuset_sched_domains 11 TINFO: root group load balance test - cpuset_sched_domains 11 TINFO: sched load balance: 1 - cpuset_sched_domains 11 TINFO: CPU hotplug: - cpuset_check_domains 1 TFAIL : cpuset_sched_domains_check.c:110: cpu32's sched domain is wrong(Domain: 0-127, CPU's Sched Domain: 0-95). - cpuset_sched_domains 11 TFAIL: partition sched domains failed. - cpuset_sched_domains 13 TINFO: general group load balance test - cpuset_sched_domains 13 TINFO: root group info: - cpuset_sched_domains 13 TINFO: sched load balance: 0 - cpuset_sched_domains 13 TINFO: general group info: - cpuset_sched_domains 13 TINFO: cpus: - - cpuset_sched_domains 13 TINFO: sched load balance: 1 - cpuset_check_domains 1 TPASS : check_sched_domains passed - cpuset_sched_domains 13 TPASS: partition sched domains succeeded. - cpuset_sched_domains 15 TINFO: general group load balance test - cpuset_sched_domains 15 TINFO: root group info: - cpuset_sched_domains 15 TINFO: sched load balance: 0 - cpuset_sched_domains 15 TINFO: general group info: - cpuset_sched_domains 15 TINFO: cpus: 1 - cpuset_sched_domains 15 TINFO: sched load balance: 0 - cpuset_check_domains 1 TPASS : check_sched_domains passed - cpuset_sched_domains 15 TPASS: partition sched domains succeeded. - cpuset_sched_domains 17 TINFO: general group load balance test - cpuset_sched_domains 17 TINFO: root group info: - cpuset_sched_domains 17 TINFO: sched load balance: 1 - cpuset_sched_domains 17 TINFO: general group info: - cpuset_sched_domains 17 TINFO: cpus: - - cpuset_sched_domains 17 TINFO: sched load balance: 1 - cpuset_check_domains 1 TFAIL : cpuset_sched_domains_check.c:110: cpu32's sched domain is wrong(Domain: 0-127, CPU's Sched Domain: 0-95). - cpuset_sched_domains 17 TFAIL: partition sched domains failed. - cpuset_sched_domains 19 TINFO: general group load balance test - cpuset_sched_domains 19 TINFO: root group info: - cpuset_sched_domains 19 TINFO: sched load balance: 1 - cpuset_sched_domains 19 TINFO: general group info: - cpuset_sched_domains 19 TINFO: cpus: 1 - cpuset_sched_domains 19 TINFO: sched load balance: 1 - cpuset_check_domains 1 TFAIL : cpuset_sched_domains_check.c:110: cpu32's sched domain is wrong(Domain: 0-127, CPU's Sched Domain: 0-95). - cpuset_sched_domains 19 TFAIL: partition sched domains failed. - cpuset_sched_domains 21 TINFO: general group load balance test - cpuset_sched_domains 21 TINFO: root group info: - cpuset_sched_domains 21 TINFO: sched load balance: 0 - cpuset_sched_domains 21 TINFO: general group info: - cpuset_sched_domains 21 TINFO: cpus: 1,2 - cpuset_sched_domains 21 TINFO: sched load balance: 0 - cpuset_check_domains 1 TPASS : check_sched_domains passed - cpuset_sched_domains 21 TPASS: partition sched domains succeeded. - cpuset_sched_domains 23 TINFO: general group load balance test - cpuset_sched_domains 23 TINFO: root group info: - cpuset_sched_domains 23 TINFO: sched load balance: 0 - cpuset_sched_domains 23 TINFO: general group info: - cpuset_sched_domains 23 TINFO: cpus: 1,2 - cpuset_sched_domains 23 TINFO: sched load balance: 1 - cpuset_check_domains 1 TPASS : check_sched_domains passed - cpuset_sched_domains 23 TPASS: partition sched domains succeeded. - cpuset_sched_domains 25 TINFO: general group load balance test - cpuset_sched_domains 25 TINFO: root group info: - cpuset_sched_domains 25 TINFO: sched load balance: 0 - cpuset_sched_domains 25 TINFO: general group info: - cpuset_sched_domains 25 TINFO: cpus: 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127 - cpuset_sched_domains 25 TINFO: sched load balance: 1 - cpuset_check_domains 1 TFAIL : cpuset_sched_domains_check.c:110: cpu32's sched domain is wrong(Domain: 0-127, CPU's Sched Domain: 0-95). - cpuset_sched_domains 25 TFAIL: partition sched domains failed. - cpuset_sched_domains 27 TINFO: general group load balance test - cpuset_sched_domains 27 TINFO: root group info: - cpuset_sched_domains 27 TINFO: sched load balance: 0 - cpuset_sched_domains 27 TINFO: general group1 info: - cpuset_sched_domains 27 TINFO: cpus: 1 - cpuset_sched_domains 27 TINFO: sched load balance: 1 - cpuset_sched_domains 27 TINFO: general group2 info: - cpuset_sched_domains 27 TINFO: cpus: 0 - cpuset_sched_domains 27 TINFO: sched load balance: 1 - cpuset_sched_domains 27 TINFO: CPU hotplug: none - cpuset_sched_domains 27 TPASS: partition sched domains succeeded. - cpuset_sched_domains 29 TINFO: general group load balance test - cpuset_sched_domains 29 TINFO: root group info: - cpuset_sched_domains 29 TINFO: sched load balance: 0 - cpuset_sched_domains 29 TINFO: general group1 info: - cpuset_sched_domains 29 TINFO: cpus: 1,2 - cpuset_sched_domains 29 TINFO: sched load balance: 1 - cpuset_sched_domains 29 TINFO: general group2 info: - cpuset_sched_domains 29 TINFO: cpus: 0-3 - cpuset_sched_domains 29 TINFO: sched load balance: 0 - cpuset_sched_domains 29 TINFO: CPU hotplug: none - cpuset_sched_domains 29 TPASS: partition sched domains succeeded. - cpuset_sched_domains 31 TINFO: general group load balance test - cpuset_sched_domains 31 TINFO: root group info: - cpuset_sched_domains 31 TINFO: sched load balance: 0 - cpuset_sched_domains 31 TINFO: general group1 info: - cpuset_sched_domains 31 TINFO: cpus: 1,2 - cpuset_sched_domains 31 TINFO: sched load balance: 1 - cpuset_sched_domains 31 TINFO: general group2 info: - cpuset_sched_domains 31 TINFO: cpus: 0,3 - cpuset_sched_domains 31 TINFO: sched load balance: 1 - cpuset_sched_domains 31 TINFO: CPU hotplug: none - cpuset_sched_domains 31 TPASS: partition sched domains succeeded. - cpuset_sched_domains 33 TINFO: general group load balance test - cpuset_sched_domains 33 TINFO: root group info: - cpuset_sched_domains 33 TINFO: sched load balance: 0 - cpuset_sched_domains 33 TINFO: general group1 info: - cpuset_sched_domains 33 TINFO: cpus: 1,2 - cpuset_sched_domains 33 TINFO: sched load balance: 1 - cpuset_sched_domains 33 TINFO: general group2 info: - cpuset_sched_domains 33 TINFO: cpus: 1,3 - cpuset_sched_domains 33 TINFO: sched load balance: 1 - cpuset_sched_domains 33 TINFO: CPU hotplug: none - cpuset_sched_domains 33 TPASS: partition sched domains succeeded. - cpuset_sched_domains 35 TINFO: general group load balance test - cpuset_sched_domains 35 TINFO: root group info: - cpuset_sched_domains 35 TINFO: sched load balance: 0 - cpuset_sched_domains 35 TINFO: general group1 info: - cpuset_sched_domains 35 TINFO: cpus: 1,2 - cpuset_sched_domains 35 TINFO: sched load balance: 1 - cpuset_sched_domains 35 TINFO: general group2 info: - cpuset_sched_domains 35 TINFO: cpus: 1,3 - cpuset_sched_domains 35 TINFO: sched load balance: 1 - cpuset_sched_domains 35 TINFO: CPU hotplug: offline - cpuset_sched_domains 35 TPASS: partition sched domains succeeded. - cpuset_sched_domains 37 TINFO: general group load balance test - cpuset_sched_domains 37 TINFO: root group info: - cpuset_sched_domains 37 TINFO: sched load balance: 0 - cpuset_sched_domains 37 TINFO: general group1 info: - cpuset_sched_domains 37 TINFO: cpus: 1,2 - cpuset_sched_domains 37 TINFO: sched load balance: 1 - cpuset_sched_domains 37 TINFO: general group2 info: - cpuset_sched_domains 37 TINFO: cpus: 1,3 - cpuset_sched_domains 37 TINFO: sched load balance: 1 - cpuset_sched_domains 37 TINFO: CPU hotplug: online - cpuset_sched_domains 37 TPASS: partition sched domains succeeded. - INFO: ltp-pan reported some tests FAIL - LTP Version: 20210927 - INFO: Test end time: Sat Nov 6 19:28:17 UTC 2021 + This does appear to be a real bug. /proc/schedstat displays 4 domain levels for + CPUs on 2 of the nodes, but only 3 levels for the others 2 (see below). + I assume this means the scheduler is making suboptimal decisions about + where to place/move processes. + + [Test Case] + On a 128 core Kunpeng 920 system, observe that half the CPUs are missing a 3rd level scheduling domain: + + ubuntu@d06-4:~$ grep domain2 /proc/schedstat | wc -l + 128 + ubuntu@d06-4:~$ grep domain3 /proc/schedstat | wc -l + 64 + ubuntu@d06-4:~$ + + [What Could Go Wrong] + This changes the code used for populating sched domains, so it could potentially break on other systems, potentially leading to poor scheduling characteristics (higher latencies, lower overall throughput etc).
-- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1951289 Title: ubuntu_ltp_controllers:cpuset_sched_domains: tests 3,9,11,17,19,25 report incorrect sched domain for cpu#32 Status in kunpeng920: In Progress Status in kunpeng920 ubuntu-18.04 series: In Progress Status in kunpeng920 ubuntu-18.04-hwe series: Fix Committed Status in kunpeng920 ubuntu-20.04 series: Fix Committed Status in kunpeng920 upstream-kernel series: Fix Released Status in ubuntu-kernel-tests: Invalid Status in linux package in Ubuntu: Fix Released Status in linux source package in Bionic: In Progress Status in linux source package in Focal: Fix Committed Status in linux source package in Hirsute: Won't Fix Bug description: [Impact] The LTP cpuset_sched_domains test, authored by Miao Xie, fails on a Kunpeng920 server that has 4 NUMA nodes: https://launchpad.net/bugs/1951289 This does appear to be a real bug. /proc/schedstat displays 4 domain levels for CPUs on 2 of the nodes, but only 3 levels for the others 2 (see below). I assume this means the scheduler is making suboptimal decisions about where to place/move processes. [Test Case] On a 128 core Kunpeng 920 system, observe that half the CPUs are missing a 3rd level scheduling domain: ubuntu@d06-4:~$ grep domain2 /proc/schedstat | wc -l 128 ubuntu@d06-4:~$ grep domain3 /proc/schedstat | wc -l 64 ubuntu@d06-4:~$ [What Could Go Wrong] This changes the code used for populating sched domains, so it could potentially break on other systems, potentially leading to poor scheduling characteristics (higher latencies, lower overall throughput etc). To manage notifications about this bug go to: https://bugs.launchpad.net/kunpeng920/+bug/1951289/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp