Hi Salavatore,

as an additional control, I have completely uninstalled the nvidia graphics driver and repeated the kworker observations using the nouveau graphics driver with the kernel 4.19.0-10-amd64. This time, there are even two kworker processes constantly running with high CPU load:

$ top
top - 12:37:20 up 10 min,  4 users,  load average: 2.79, 2.54, 1.56
Tasks: 197 total,   3 running, 194 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us, 24.2 sy,  0.0 ni, 74.2 id,  0.0 wa, 0.0 hi,  1.6 si,  0.0 st
MiB Mem :  15889.4 total,  13964.7 free,    626.8 used, 1297.9 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used. 14849.1 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND   164 root      20   0       0      0      0 R  80.0 0.0   8:41.67 kworker/6:2+pm   455 root      20   0       0      0      0 R  80.0 0.0   8:28.23 kworker/2:2+pm    22 root      20   0       0      0      0 S  20.0 0.0   2:14.82 ksoftirqd/2    42 root      20   0       0      0      0 S  20.0 0.0   2:08.67 ksoftirqd/6
    1 root      20   0  169644  10212   7796 S   0.0 0.1   0:01.52 systemd
    2 root      20   0       0      0      0 S   0.0 0.0   0:00.00 kthreadd
    3 root       0 -20       0      0      0 I   0.0 0.0   0:00.00 rcu_gp
    4 root       0 -20       0      0      0 I   0.0 0.0   0:00.00 rcu_par_gp     6 root       0 -20       0      0      0 I   0.0 0.0   0:00.00 kworker/0:0H-kblockd     7 root      20   0       0      0      0 I   0.0 0.0   0:00.05 kworker/u16:0-event+

The stacks of the two kworker processes show the same output:

[<0>] 0xffffffffffffffff

I have appended the top 5000 lines tracing as a compressed ascii file out-cut.txt,gz and the dmesg output as compressed ascii file dmesg.txt.gz.

I hope, this helps to find out where the problem with the high CPU load of the kworker processes come from.

Cheers,

Dirk.

Am 02.08.20 um 18:22 schrieb Salvatore Bonaccorso:
Hi Dirk,

On Sun, Aug 02, 2020 at 03:44:09PM +0200, Salvatore Bonaccorso wrote:
Control: tags -1 + moreinfo

Hi Dirk

On Sun, Aug 02, 2020 at 10:00:27AM +0200, Dirk Kostrewa wrote:
Package: src:linux
Version: 4.19.132-1
Severity: normal

Dear Maintainer,

after booting the kernel 4.19.0-10-amd64, there is a kworker process running
with a permanent high CPU load of almost 90% as reported by the "top"
command:

$ top
top - 09:48:19 up 0 min,  4 users,  load average: 1.91, 0.58, 0.20
Tasks: 218 total,   2 running, 216 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.8 us, 12.4 sy,  0.0 ni, 84.5 id,  0.0 wa,  0.0 hi,  2.3 si,  0.0
st
MiB Mem :  15889.4 total,  14173.1 free,    889.3 used,    827.0 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  14677.7 avail Mem

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM TIME+ COMMAND
    64 root      20   0       0      0      0 R  86.7   0.0 0:47.41
kworker/0:2+pm
     9 root      20   0       0      0      0 S  20.0   0.0 0:08.84
ksoftirqd/0
   364 root     -51   0       0      0      0 S   6.7   0.0 0:00.50
irq/126-nvidia
  1177 dirk      20   0 2921696 122848  94268 S   6.7   0.8 0:02.23 kwin_x11
     1 root      20   0  169652  10280   7740 S   0.0   0.1 0:01.56 systemd
     2 root      20   0       0      0      0 S   0.0   0.0 0:00.00 kthreadd
...

The expected result after booting the kernel 4.19.0-10-amd64 is a kworker
process with a CPU load close to 0%.

As a control, booting the previous kernel 4.19.0-9-amd64 does not show a
high CPU load for the kworker process. Instead, the kworker CPU load
reported by the "top" command is 0.0%.

Therefore, I suspect a bug in the kernel 4.19.0-10-amd64.

Neither "dmesg" nor "journalctl -b" show any messages containing "kworker".

I am using Debian/GNU Linux 10.5 with kernel 4.19.0-10-amd64 and libc6:amd64
2.28-10.

If you need more information, I would be happy to provide it.
To find out what could be the cause, could you have a look at
https://www.kernel.org/doc/html/latest/core-api/workqueue.html#debugging
this could help determining isolating why the kworker goes crazy.
Please as well to the above one additional thing: Can you reproduce
the issue when the kernel does not get tained? So without loading the
propriertary, out-of-tree modules.

This is particularly important if the issue can be tracked down, found
in upstream and needs to be reported upstream.

Regards,
Salvatore

Attachment: dmesg.txt.gz
Description: application/gzip

Attachment: out-cut.txt.gz
Description: application/gzip

Reply via email to