Archive contained all files/scripts/results mentioned

** Attachment added: "Archive contained and files/scripts/results mentioned"
   
https://bugs.launchpad.net/ubuntu/+source/linux-lowlatency/+bug/2023391/+attachment/5678810/+files/files.zip

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-lowlatency in Ubuntu.
https://bugs.launchpad.net/bugs/2023391

Title:
  Getting interrupts on isolated cores which causes significant jitter
  during low-latency work

Status in linux-lowlatency package in Ubuntu:
  New

Bug description:
  Summary:
  LOC, IWI, RES and CAL interrupts are observed on isolated cores on which 
low-latency benchmark is performed. Interrupts are caused by simple Go 
application (printing "Hello world" every 1 second) which runs on different, 
non-isolated cores. Similar Python application doesn't cause such problems.

  Tested on Ubuntu 22.04.2 LTS (GNU/Linux 5.15.0-68-lowlatency x86_64) 
(compiled with: "Full Dynticks System (tickless)" and "No Forced Preemption 
(Server)").
  Would like to find out what causes this issue (Go itself? Kernel issue? Lack 
of proper kernel settings/parameters? Other?). Looking for help with hunting 
down root cause!

  Reason:
  To run Go-based applications on environments when lowlatency workloads are 
executed.

  Details:

  Hardware:
  2 x Intel(R) Xeon(R) Gold 6438N (32 cores each)

  BIOS:
  Hyperthreading disabled

  OS and configuration:
  Ubuntu 22.04.2 LTS (GNU/Linux 5.15.0-68-lowlatency x86_64) (compiled with: 
"Full Dynticks System (tickless)" and "No Forced Preemption (Server)" from 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/jammy/log/?h=lowlatency-next)

  irqbalance stopped and disabled: 
  systemctl stop irqbalance.service
  systemctl disable irqbalance.service

  Based on workload type, experiments and knowledge found in the Internet, 
following kernel parameters were used:
  cat /proc/cmdline
  BOOT_IMAGE=/boot/vmlinuz-5.15.0-68-lowlatency 
root=UUID=5c9c2ea3-e0c6-4dd8-ae70-57e0c0af20d3 ro ro rhgb quiet ipv6.disable=1 
audit=0 selinux=0 hugepages=256 hugepagesz=1G intel_iommu=on iommu=pt 
nmi_watchdog=0 mce=off tsc=reliable nosoftlockup hpet=disable skew_tick=1 
acpi_pad.disable=1 nowatchdog nomce numa_balancing=disable irqaffinity=0 
rcu_nocb_poll processor.max_cstate=0 clocksource=tsc nosmt nohz=on 
nohz_full=20-23 rcu_nocbs=20-23 isolcpus=nohz,domain,managed_irq,20-23

  For every core/socket: Cx states (for x > 0) were disabled, particular power 
governor was used and fixed uncore values were set.
  To achieve that power.py script from 
https://github.com/intel/CommsPowerManagement was used. Check "prepare_cpus.sh" 
for particular commands and "cpu_prepared.png" for results.
  CPUs 20-23 are "isolated" (thanks to proper kernel parameters) - 
benchmark/workload will be run on them.

  cat /sys/devices/virtual/workqueue/cpumask
  ffffffff,ff0fffff
  (kernel threads moved from CPU20-23)

  "get_irqs.sh" - script which checks which target CPUs are permitted for a 
given IRQ sources. "get_irqs_output.txt" contains output of mentioned script.
  "lscpu_output.txt" - contains output of 'lscpu' command.


  JITTER tool - Baseline
  jitter is benchmarking tool which is meant for measuring the "jitter" in the 
execution time caused by OS and/or the underlying architecture.

  git clone https://github.com/FDio/archived-pma_tools
  cd archived-pma_tools/jitter

  Put "run_jitter.sh" script inside above directory.

  Run:
  make
  ./run_jitter.sh

  Results:
  - "jitter_base.txt" - output from "run_jitter.sh" script
  - "jitter_base.png" - chart created from above output 

  Comment:
  jitter tool shows intervals and jitter in CPU Core cycles. Benchmark is done 
on 2000 MHz core so on graph values are divided by 2 and presented in 
nanoseconds.
  Very stable results, no significant jitters (max jitter: 51ns) during 335 
seconds.
  No interruptions made on isolated CPU20 during benchmark.


  JITTER tool - Python
  "hello.py" - simple Python app which prints "Hello world" every 1 second
  "run_python_hello.sh" - script to run python app on particular (non-isolated) 
core

  python3 --version
  Python 3.10.6

  In first console "./run_python_hello.sh" was started, in second
  console "./run_jitter.sh" was run.

  Results:
  - "jitter_python.txt" - output from "run_jitter.sh" script
  - "jitter_python.png" - chart created from above output 

  Comment:
  Acceptable result, one noticeable jitter (1190ns), the remaining jitters did 
not exceed 60ns during 336 seconds.
  No interruptions made on isolated CPU20 during benchmark.


  
  JITTER tool - Golang
  "hello.go" - simple Golang app which prints "Hello world" every 1 second
  "go.mod" - go module definition
  "run_go_hello.sh" - script to run Go app on particular (non-isolated) core

  go version
  go version go1.20.5 linux/amd64

  In first console Go app was built: "go build" and started:
  "./run_go_hello.sh", in second console "./run_jitter.sh" was run.

  Results:
  - "jitter_go.txt" - output from "run_jitter.sh" script
  - "jitter_go.png" - chart created from above output 

  Comment:
  34 significant jitters (the worst had: 44961ns) during 335 seconds.
  Following interruptions were made on isolated CPU20 during benchmark:
  LOC: 67
  IWI: 34
  RES: 34
  RES: 34

  It seems that every jitter is made every ~10s.

  What is also interesting that for idle and isolated CPU22 and CPU23 no
  interruptions were made during benchmark. For CPU24 (not isolated)
  only LOC were made (335283 of them).

  
  Notes:
  1. Instead of static isolation (using kernel parameters) I tried also with 
cpuset and its shield turned on. Unfortunately, results were even worse 
(jitters were "bigger" and more interruptions were made to shielded cores), 
moreover cset was not able to move kernel threads outside of shielded pool.
  2. I checked it also on Realtime kernel (GNU/Linux 5.15.65-rt49 x86_64 -> 
https://mirrors.edge.kernel.org/pub/linux/kernel/v5.x/linux-5.15.65.tar.gz 
patched with 
https://cdn.kernel.org/pub/linux/kernel/projects/rt/5.15/older/patch-5.15.65-rt49.patch.gz)
 and problem with interrupts and jitters done by Go app doesn't exist there. 
However, RT kernel is not the best solution for everyone and it would be great 
to not have jitters also on lowlatency tickless kernel.  
  3. I also did a lot of experiments with different kernel parameters, seems 
that this combination was the best (however, maybe I missed something).
  4. Same situation with Go app built using 1.19.x and 1.20.2.
  5. I'm aware that this kind of benchmark should be executed for hours, but 
for now these results are pretty meaningful.

  
  I'm aware of this bug submitted for realtime kernel 
https://bugs.launchpad.net/ubuntu-realtime/+bug/1992164 where 
https://launchpad.net/~jsalisbury assisted a lot. It helped me to tune my 
parameters but right now I'm stuck.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-lowlatency/+bug/2023391/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to