Hi Luciano, Am Wed, Sep 14, 2022 at 07:24:07AM +0200 schrieb Luciano Mannucci: > hello all! > > I have a virtual machine running under kvm who started hanging giving > this message just before it dies: > > kernel:[ 296.013011] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! > [swapper/0:0] > > This happens only on high i/o load. > The other virtual machines are all running with no problems. > What should I do?
The message actually means, that moving memory pages to/from swap space took much longer than the kernel expects. This can happen when another process is using the entire I/O bandwidth to the disk. I had similar issues with my Desktop PC. It turned out this was somehow related to the 32GB RAM of my machine. When a process writes files, the kernel will cache the data first and executes the actual disk writes later depending on cache fill and time. When a process produces data very fast, the cache will grow more and more even while the kernel is already writing data out to disk and at some point an internal threshold in the kernel is hit. (/proc/sys/vm/dirty_ratio) At this time, the kernel will block all processes writing to disks and flush the entire cache content to the disk. If you have a lot of RAM, this flushing can take a lot of time (seconds till minutes). Large RAM machines are affected by this since the threshold is by default set as ratio from f RAM memory. I mitigated this by reconfigure the so called background write threshold cat /etc/sysctl.d/tuning.conf # The following settings are to avoid long application stalls when # writing large files to disk. They lower the amount of write # cached data in RAM until actual writing occurs. This will prevent # the system from writing data in large chunks while everything # else blocks. So this improves the latency of the desktop # The values are by defaulted computed as fraction of the main memory # which results in fairly large cached unwritten data on high memory # systems # Start background writing when more than 128MB data are in write cache # This value is tuned regarding write performance of HDD ~ 100MB vm.dirty_background_bytes=67108864 vm.dirty_bytes=268435456 Maybe this additional information is helpful: https://forum.proxmox.com/threads/io-performance-tuning.15893/ https://lonesysadmin.net/2013/12/22/better-linux-disk-caching-performance-vm-dirty_ratio/ Hope that helps, cheers, Andreas -- gnuPG keyid: 8C2BAF51 fingerprint: 28EE 8438 E688 D992 3661 C753 90B3 BAAA 8C2B AF51
signature.asc
Description: PGP signature
_______________________________________________ Dng mailing list Dng@lists.dyne.org https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng