** Description changed:

- On scobee-kernel(arm64) with hirsute:linux(5.11.0-41.45) for
- sru-20211108 there are several reports about the sched domain not
- covering the full range. The same does not happen on kuzzle. But 32 is a
- bit of a suspicious number.
+ [Impact]
+ The LTP cpuset_sched_domains test, authored by Miao Xie, fails on a Kunpeng920
+ server that has 4 NUMA nodes:
+   https://launchpad.net/bugs/1951289
  
-   Running tests.......
-   cpuset_sched_domains 1 TINFO: CPUs are numbered continuously starting at 0 
(0-127)
-   cpuset_sched_domains 1 TINFO: Nodes are numbered continuously starting at 0 
(0-3)
-   cpuset_sched_domains 1 TINFO: root group load balance test
-   cpuset_sched_domains 1 TINFO:      sched load balance: 0
-   cpuset_sched_domains 1 TINFO: CPU hotplug:
-   cpuset_check_domains    1  TPASS  :  check_sched_domains passed
-   cpuset_sched_domains 1 TPASS: partition sched domains succeeded.
-   cpuset_sched_domains 3 TINFO: root group load balance test
-   cpuset_sched_domains 3 TINFO:      sched load balance: 1
-   cpuset_sched_domains 3 TINFO: CPU hotplug:
-   cpuset_check_domains    1  TFAIL  :  cpuset_sched_domains_check.c:110: 
cpu32's sched domain is wrong(Domain: 0-127, CPU's Sched Domain: 0-95).
-   cpuset_sched_domains 3 TFAIL: partition sched domains failed.
-   cpuset_sched_domains 5 TINFO: root group load balance test
-   cpuset_sched_domains 5 TINFO:      sched load balance: 0
-   cpuset_sched_domains 5 TINFO: CPU hotplug:
-   cpuset_check_domains    1  TPASS  :  check_sched_domains passed
-   cpuset_sched_domains 5 TPASS: partition sched domains succeeded.
-   cpuset_sched_domains 7 TINFO: root group load balance test
-   cpuset_sched_domains 7 TINFO:      sched load balance: 0
-   cpuset_sched_domains 7 TINFO: CPU hotplug:
-   cpuset_check_domains    1  TPASS  :  check_sched_domains passed
-   cpuset_sched_domains 7 TPASS: partition sched domains succeeded.
-   cpuset_sched_domains 9 TINFO: root group load balance test
-   cpuset_sched_domains 9 TINFO:      sched load balance: 1
-   cpuset_sched_domains 9 TINFO: CPU hotplug:
-   cpuset_check_domains    1  TFAIL  :  cpuset_sched_domains_check.c:110: 
cpu32's sched domain is wrong(Domain: 0-127, CPU's Sched Domain: 0-95).
-   cpuset_sched_domains 9 TFAIL: partition sched domains failed.
-   cpuset_sched_domains 11 TINFO: root group load balance test
-   cpuset_sched_domains 11 TINFO:      sched load balance: 1
-   cpuset_sched_domains 11 TINFO: CPU hotplug:
-   cpuset_check_domains    1  TFAIL  :  cpuset_sched_domains_check.c:110: 
cpu32's sched domain is wrong(Domain: 0-127, CPU's Sched Domain: 0-95).
-   cpuset_sched_domains 11 TFAIL: partition sched domains failed.
-   cpuset_sched_domains 13 TINFO: general group load balance test
-   cpuset_sched_domains 13 TINFO: root group info:
-   cpuset_sched_domains 13 TINFO:      sched load balance: 0
-   cpuset_sched_domains 13 TINFO: general group info:
-   cpuset_sched_domains 13 TINFO:      cpus: -
-   cpuset_sched_domains 13 TINFO:      sched load balance: 1
-   cpuset_check_domains    1  TPASS  :  check_sched_domains passed
-   cpuset_sched_domains 13 TPASS: partition sched domains succeeded.
-   cpuset_sched_domains 15 TINFO: general group load balance test
-   cpuset_sched_domains 15 TINFO: root group info:
-   cpuset_sched_domains 15 TINFO:      sched load balance: 0
-   cpuset_sched_domains 15 TINFO: general group info:
-   cpuset_sched_domains 15 TINFO:      cpus: 1
-   cpuset_sched_domains 15 TINFO:      sched load balance: 0
-   cpuset_check_domains    1  TPASS  :  check_sched_domains passed
-   cpuset_sched_domains 15 TPASS: partition sched domains succeeded.
-   cpuset_sched_domains 17 TINFO: general group load balance test
-   cpuset_sched_domains 17 TINFO: root group info:
-   cpuset_sched_domains 17 TINFO:      sched load balance: 1
-   cpuset_sched_domains 17 TINFO: general group info:
-   cpuset_sched_domains 17 TINFO:      cpus: -
-   cpuset_sched_domains 17 TINFO:      sched load balance: 1
-   cpuset_check_domains    1  TFAIL  :  cpuset_sched_domains_check.c:110: 
cpu32's sched domain is wrong(Domain: 0-127, CPU's Sched Domain: 0-95).
-   cpuset_sched_domains 17 TFAIL: partition sched domains failed.
-   cpuset_sched_domains 19 TINFO: general group load balance test
-   cpuset_sched_domains 19 TINFO: root group info:
-   cpuset_sched_domains 19 TINFO:      sched load balance: 1
-   cpuset_sched_domains 19 TINFO: general group info:
-   cpuset_sched_domains 19 TINFO:      cpus: 1
-   cpuset_sched_domains 19 TINFO:      sched load balance: 1
-   cpuset_check_domains    1  TFAIL  :  cpuset_sched_domains_check.c:110: 
cpu32's sched domain is wrong(Domain: 0-127, CPU's Sched Domain: 0-95).
-   cpuset_sched_domains 19 TFAIL: partition sched domains failed.
-   cpuset_sched_domains 21 TINFO: general group load balance test
-   cpuset_sched_domains 21 TINFO: root group info:
-   cpuset_sched_domains 21 TINFO:      sched load balance: 0
-   cpuset_sched_domains 21 TINFO: general group info:
-   cpuset_sched_domains 21 TINFO:      cpus: 1,2
-   cpuset_sched_domains 21 TINFO:      sched load balance: 0
-   cpuset_check_domains    1  TPASS  :  check_sched_domains passed
-   cpuset_sched_domains 21 TPASS: partition sched domains succeeded.
-   cpuset_sched_domains 23 TINFO: general group load balance test
-   cpuset_sched_domains 23 TINFO: root group info:
-   cpuset_sched_domains 23 TINFO:      sched load balance: 0
-   cpuset_sched_domains 23 TINFO: general group info:
-   cpuset_sched_domains 23 TINFO:      cpus: 1,2
-   cpuset_sched_domains 23 TINFO:      sched load balance: 1
-   cpuset_check_domains    1  TPASS  :  check_sched_domains passed
-   cpuset_sched_domains 23 TPASS: partition sched domains succeeded.
-   cpuset_sched_domains 25 TINFO: general group load balance test
-   cpuset_sched_domains 25 TINFO: root group info:
-   cpuset_sched_domains 25 TINFO:      sched load balance: 0
-   cpuset_sched_domains 25 TINFO: general group info:
-   cpuset_sched_domains 25 TINFO:      cpus: 
0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127
-   cpuset_sched_domains 25 TINFO:      sched load balance: 1
-   cpuset_check_domains    1  TFAIL  :  cpuset_sched_domains_check.c:110: 
cpu32's sched domain is wrong(Domain: 0-127, CPU's Sched Domain: 0-95).
-   cpuset_sched_domains 25 TFAIL: partition sched domains failed.
-   cpuset_sched_domains 27 TINFO: general group load balance test
-   cpuset_sched_domains 27 TINFO: root group info:
-   cpuset_sched_domains 27 TINFO:      sched load balance: 0
-   cpuset_sched_domains 27 TINFO: general group1 info:
-   cpuset_sched_domains 27 TINFO:      cpus: 1
-   cpuset_sched_domains 27 TINFO:      sched load balance: 1
-   cpuset_sched_domains 27 TINFO: general group2 info:
-   cpuset_sched_domains 27 TINFO:      cpus: 0
-   cpuset_sched_domains 27 TINFO:      sched load balance: 1
-   cpuset_sched_domains 27 TINFO: CPU hotplug: none
-   cpuset_sched_domains 27 TPASS: partition sched domains succeeded.
-   cpuset_sched_domains 29 TINFO: general group load balance test
-   cpuset_sched_domains 29 TINFO: root group info:
-   cpuset_sched_domains 29 TINFO:      sched load balance: 0
-   cpuset_sched_domains 29 TINFO: general group1 info:
-   cpuset_sched_domains 29 TINFO:      cpus: 1,2
-   cpuset_sched_domains 29 TINFO:      sched load balance: 1
-   cpuset_sched_domains 29 TINFO: general group2 info:
-   cpuset_sched_domains 29 TINFO:      cpus: 0-3
-   cpuset_sched_domains 29 TINFO:      sched load balance: 0
-   cpuset_sched_domains 29 TINFO: CPU hotplug: none
-   cpuset_sched_domains 29 TPASS: partition sched domains succeeded.
-   cpuset_sched_domains 31 TINFO: general group load balance test
-   cpuset_sched_domains 31 TINFO: root group info:
-   cpuset_sched_domains 31 TINFO:      sched load balance: 0
-   cpuset_sched_domains 31 TINFO: general group1 info:
-   cpuset_sched_domains 31 TINFO:      cpus: 1,2
-   cpuset_sched_domains 31 TINFO:      sched load balance: 1
-   cpuset_sched_domains 31 TINFO: general group2 info:
-   cpuset_sched_domains 31 TINFO:      cpus: 0,3
-   cpuset_sched_domains 31 TINFO:      sched load balance: 1
-   cpuset_sched_domains 31 TINFO: CPU hotplug: none
-   cpuset_sched_domains 31 TPASS: partition sched domains succeeded.
-   cpuset_sched_domains 33 TINFO: general group load balance test
-   cpuset_sched_domains 33 TINFO: root group info:
-   cpuset_sched_domains 33 TINFO:      sched load balance: 0
-   cpuset_sched_domains 33 TINFO: general group1 info:
-   cpuset_sched_domains 33 TINFO:      cpus: 1,2
-   cpuset_sched_domains 33 TINFO:      sched load balance: 1
-   cpuset_sched_domains 33 TINFO: general group2 info:
-   cpuset_sched_domains 33 TINFO:      cpus: 1,3
-   cpuset_sched_domains 33 TINFO:      sched load balance: 1
-   cpuset_sched_domains 33 TINFO: CPU hotplug: none
-   cpuset_sched_domains 33 TPASS: partition sched domains succeeded.
-   cpuset_sched_domains 35 TINFO: general group load balance test
-   cpuset_sched_domains 35 TINFO: root group info:
-   cpuset_sched_domains 35 TINFO:      sched load balance: 0
-   cpuset_sched_domains 35 TINFO: general group1 info:
-   cpuset_sched_domains 35 TINFO:      cpus: 1,2
-   cpuset_sched_domains 35 TINFO:      sched load balance: 1
-   cpuset_sched_domains 35 TINFO: general group2 info:
-   cpuset_sched_domains 35 TINFO:      cpus: 1,3
-   cpuset_sched_domains 35 TINFO:      sched load balance: 1
-   cpuset_sched_domains 35 TINFO: CPU hotplug: offline
-   cpuset_sched_domains 35 TPASS: partition sched domains succeeded.
-   cpuset_sched_domains 37 TINFO: general group load balance test
-   cpuset_sched_domains 37 TINFO: root group info:
-   cpuset_sched_domains 37 TINFO:      sched load balance: 0
-   cpuset_sched_domains 37 TINFO: general group1 info:
-   cpuset_sched_domains 37 TINFO:      cpus: 1,2
-   cpuset_sched_domains 37 TINFO:      sched load balance: 1
-   cpuset_sched_domains 37 TINFO: general group2 info:
-   cpuset_sched_domains 37 TINFO:      cpus: 1,3
-   cpuset_sched_domains 37 TINFO:      sched load balance: 1
-   cpuset_sched_domains 37 TINFO: CPU hotplug: online
-   cpuset_sched_domains 37 TPASS: partition sched domains succeeded.
-   INFO: ltp-pan reported some tests FAIL
-   LTP Version: 20210927
-   INFO: Test end time: Sat Nov  6 19:28:17 UTC 2021
+ This does appear to be a real bug. /proc/schedstat displays 4 domain levels 
for
+ CPUs on 2 of the nodes, but only 3 levels for the others 2 (see below).
+ I assume this means the scheduler is making suboptimal decisions about
+ where to place/move processes.
+ 
+ [Test Case]
+ On a 128 core Kunpeng 920 system, observe that half the CPUs are missing a 
3rd level scheduling domain:
+ 
+ ubuntu@d06-4:~$ grep domain2 /proc/schedstat  | wc -l
+ 128
+ ubuntu@d06-4:~$ grep domain3 /proc/schedstat  | wc -l
+ 64
+ ubuntu@d06-4:~$ 
+ 
+ [What Could Go Wrong]
+ This changes the code used for populating sched domains, so it could 
potentially break on other systems, potentially leading to poor scheduling 
characteristics (higher latencies, lower overall throughput etc).

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1951289

Title:
  ubuntu_ltp_controllers:cpuset_sched_domains: tests 3,9,11,17,19,25
  report incorrect sched domain for cpu#32

Status in kunpeng920:
  In Progress
Status in kunpeng920 ubuntu-18.04 series:
  In Progress
Status in kunpeng920 ubuntu-18.04-hwe series:
  Fix Committed
Status in kunpeng920 ubuntu-20.04 series:
  Fix Committed
Status in kunpeng920 upstream-kernel series:
  Fix Released
Status in ubuntu-kernel-tests:
  Invalid
Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Focal:
  Fix Committed
Status in linux source package in Hirsute:
  Won't Fix

Bug description:
  [Impact]
  The LTP cpuset_sched_domains test, authored by Miao Xie, fails on a Kunpeng920
  server that has 4 NUMA nodes:
    https://launchpad.net/bugs/1951289

  This does appear to be a real bug. /proc/schedstat displays 4 domain levels 
for
  CPUs on 2 of the nodes, but only 3 levels for the others 2 (see below).
  I assume this means the scheduler is making suboptimal decisions about
  where to place/move processes.

  [Test Case]
  On a 128 core Kunpeng 920 system, observe that half the CPUs are missing a 
3rd level scheduling domain:

  ubuntu@d06-4:~$ grep domain2 /proc/schedstat  | wc -l
  128
  ubuntu@d06-4:~$ grep domain3 /proc/schedstat  | wc -l
  64
  ubuntu@d06-4:~$ 

  [What Could Go Wrong]
  This changes the code used for populating sched domains, so it could 
potentially break on other systems, potentially leading to poor scheduling 
characteristics (higher latencies, lower overall throughput etc).

To manage notifications about this bug go to:
https://bugs.launchpad.net/kunpeng920/+bug/1951289/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to