Ever since upgrading to a Ryzen (1700X), I have experienced frequent
system freezes, which may be related to the problems discussed here.


The freeze mostly happens during a certain heavily threaded task with
disk io.

Symptoms:

* Screen completely freezes, including mouse pointer,
* Existing SSH connections die, no new connection can be established,
* System can no longer switch to text console,
* LEDs indicate **unceasing disk activity**,
* System still responds to pings,
* Alt-SysRq keys remain active, but cannot output to screen even if already in 
text console.

I've succeeded in capturing kernel logging after a freeze using
netconsole:

This timeout message appears:

    [35042.581242] INFO: task jbd2/dm-2-8:610 blocked for more than 120 seconds.
    [35042.581259]       Not tainted 4.15.0-62-generic #69-Ubuntu
    [35042.581262] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
    [35042.581273] jbd2/dm-2-8     D    0   610      2 0x80000000
    [35042.581278] Call Trace:
    [35042.581290]  __schedule+0x24e/0x880
    [35042.581295]  ? bit_wait+0x60/0x60
    [35042.581300]  schedule+0x2c/0x80
    [35042.581304]  io_schedule+0x16/0x40
    [35042.581308]  bit_wait_io+0x11/0x60
    [35042.581313]  __wait_on_bit+0x4c/0x90
    [35042.581317]  out_of_line_wait_on_bit+0x90/0xb0
    [35042.581323]  ? bit_waitqueue+0x40/0x40
    [35042.581328]  __wait_on_buffer+0x32/0x40
    [35042.581333]  jbd2_journal_commit_transaction+0xdac/0x1730
    [35042.581337]  ? __switch_to_asm+0x41/0x70
    [35042.581343]  kjournald2+0xc8/0x270
    [35042.581347]  ? kjournald2+0xc8/0x270
    [35042.581351]  ? wait_woken+0x80/0x80
    [35042.581355]  kthread+0x121/0x140
    [35042.581359]  ? commit_timeout+0x20/0x20
    [35042.581363]  ? kthread_create_worker_on_cpu+0x70/0x70
    [35042.581366]  ret_from_fork+0x22/0x40
    [35042.581242] INFO: task jbd2/dm-2-8:610 blocked for more than 120 seconds.
    [35042.581259]       Not tainted 4.15.0-62-generic #69-Ubuntu
    [35042.581262] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
    [35042.581273] jbd2/dm-2-8     D    0   610      2 0x80000000
    [35042.581278] Call Trace:
    [35042.581290]  __schedule+0x24e/0x880
    [35042.581295]  ? bit_wait+0x60/0x60
    [35042.581300]  schedule+0x2c/0x80
    [35042.581304]  io_schedule+0x16/0x40
    [35042.581308]  bit_wait_io+0x11/0x60
    [35042.581313]  __wait_on_bit+0x4c/0x90
    [35042.581317]  out_of_line_wait_on_bit+0x90/0xb0
    [35042.581323]  ? bit_waitqueue+0x40/0x40
    [35042.581328]  __wait_on_buffer+0x32/0x40
    [35042.581333]  jbd2_journal_commit_transaction+0xdac/0x1730
    [35042.581337]  ? __switch_to_asm+0x41/0x70
    [35042.581343]  kjournald2+0xc8/0x270
    [35042.581347]  ? kjournald2+0xc8/0x270
    [35042.581351]  ? wait_woken+0x80/0x80
    [35042.581355]  kthread+0x121/0x140
    [35042.581359]  ? commit_timeout+0x20/0x20
    [35042.581363]  ? kthread_create_worker_on_cpu+0x70/0x70
    [35042.581366]  ret_from_fork+0x22/0x40

Also, I have thousands of lines of output for blocked tasks. Most traces
look more or less like this:

    [34274.346748] sysrq: SysRq : Show Blocked State
    [34274.346766]   task                        PC stack   pid father
    [34274.346771] systemd         D    0     1      0 0x00000000
    [34274.346776] Call Trace:
    [34274.346786]  __schedule+0x24e/0x880
    [34274.346792]  ? mempool_alloc_slab+0x15/0x20
    [34274.346795]  schedule+0x2c/0x80
    [34274.346798]  schedule_timeout+0x15d/0x350
    [34274.346804]  ? __next_timer_interrupt+0xe0/0xe0
    [34274.346808]  ? wait_woken+0x80/0x80
    [34274.346812]  io_schedule_timeout+0x1e/0x50
    [34274.346815]  mempool_alloc+0x15d/0x190
    [34274.346820]  ? wait_woken+0x80/0x80
    [34274.346825]  bio_alloc_bioset+0xa9/0x1e0
    [34274.346830]  __split_and_process_non_flush+0x147/0x2c0
    [34274.346834]  __split_and_process_bio+0x139/0x2a0
    [34274.346838]  dm_make_request+0x7a/0xd0
    [34274.346843]  ? SyS_madvise+0x990/0x990
    [34274.346847]  generic_make_request+0x124/0x300
    [34274.346850]  submit_bio+0x73/0x140
    [34274.346853]  ? submit_bio+0x73/0x140
    [34274.346856]  ? get_swap_bio+0xcd/0x100
    [34274.346861]  __swap_writepage+0x323/0x3b0
    [34274.346865]  ? __frontswap_store+0x73/0x100
    [34274.346869]  swap_writepage+0x34/0x90
    [34274.346872]  pageout.isra.54+0x11b/0x350
    [34274.346878]  shrink_page_list+0x99a/0xbc0
    [34274.346883]  shrink_inactive_list+0x242/0x590
    [34274.346887]  shrink_node_memcg+0x364/0x770
    [34274.346892]  shrink_node+0xf7/0x300
    [34274.346896]  ? shrink_node+0xf7/0x300
    [34274.346900]  do_try_to_free_pages+0xc9/0x330
    [34274.346904]  try_to_free_pages+0xee/0x1b0
    [34274.346910]  __alloc_pages_slowpath+0x3fc/0xe00
    [34274.346914]  ? __switch_to_asm+0x35/0x70
    [34274.346917]  ? __switch_to_asm+0x35/0x70
    [34274.346920]  ? __switch_to_asm+0x35/0x70
    [34274.346924]  ? __switch_to_asm+0x35/0x70
    [34274.346929]  ? __switch_to_asm+0x35/0x70
    [34274.346932]  ? __switch_to_asm+0x41/0x70
    [34274.346936]  __alloc_pages_nodemask+0x29a/0x2c0
    [34274.346940]  alloc_pages_current+0x6a/0xe0
    [34274.346944]  __page_cache_alloc+0x81/0xa0
    [34274.346948]  __do_page_cache_readahead+0x113/0x2c0
    [34274.346952]  ? radix_tree_lookup_slot+0x22/0x50
    [34274.346956]  ? find_get_entry+0x1e/0x110
    [34274.346959]  filemap_fault+0x2ad/0x6f0
    [34274.346968]  ? filemap_fault+0x2ad/0x6f0
    [34274.346971]  ? page_add_file_rmap+0x134/0x180
    [34274.346975]  ? filemap_map_pages+0x181/0x390
    [34274.346980]  ext4_filemap_fault+0x31/0x44
    [34274.346748] sysrq: SysRq : Show Blocked State
    [34274.346984]  __do_fault+0x5b/0x115
    [34274.346988]  __handle_mm_fault+0xdef/0x1290
    [34274.346992]  handle_mm_fault+0xb1/0x210
    [34274.346997]  __do_page_fault+0x281/0x4b0
    [34274.347001]  do_page_fault+0x2e/0xe0
    [34274.347004]  ? page_fault+0x2f/0x50
    [34274.347008]  page_fault+0x45/0x50
    [34274.347011] RIP: 0033:0x7fa9446ee83a
    [34274.347015] RSP: 002b:00007ffcccb01470 EFLAGS: 00010206
    [34274.347019] RAX: 0000000000000001 RBX: 00005615eff63650 RCX: 
00007fa944bcebb7
    [34274.347021] RDX: 0000000000000093 RSI: 00007ffcccb01470 RDI: 
0000000000000000
    [34274.347025] RBP: 00007ffcccb01c60 R08: 0000000000000000 R09: 
0000000000000008
    [34274.347027] R10: 00000000ffffffff R11: 0000000000000000 R12: 
0000000000000001
    [34274.347032] R13: ffffffffffffffff R14: 00007ffcccb01470 R15: 
0000000000000001

Another detail that may be relevant it that show-task-states outputs
about 5000 lines of this kind:

    [34830.962684]     in-flight: 3235:kcryptd_crypt [dm_crypt], 
3237:kcryptd_crypt [dm_crypt], 6056:kcryptd_crypt [dm_crypt], 
6058:kcryptd_crypt [dm_crypt], 6055:kcryptd_crypt [dm_crypt], 
6057:kcryptd_crypt [dm_crypt], 3992:kcryptd_crypt [dm_crypt], 
4861:kcryptd_crypt [dm_crypt], 32431:kcryptd_crypt [dm_crypt], 
2682:kcryptd_crypt [dm_crypt], 4850:kcryptd_crypt [dm_crypt], 
1429:kcryptd_crypt [dm_crypt], 6054:kcryptd_crypt [dm_crypt], 
6060:kcryptd_crypt [dm_crypt], 4862:kcryptd_crypt [dm_crypt], 
1862:kcryptd_crypt [dm_crypt]
    [34830.874519] DefaultDispatch 22853    343906.389378     12131   120 
    [34830.962714]     delayed: kcryptd_crypt [dm_crypt], kcryptd_crypt 
[dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt 
[dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt 
[dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt 
[dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt 
[dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt 
[dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt 
[dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt 
[dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt 
[dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt 
[dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt]
    [34830.962761] , kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt]
    [34830.881008]   .nr_spread_over                : 0
    [34830.962862] , kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt]

Is there someone who can interpret all this? If it is helpful I can
attach the full blocked-tasks output.

(kernel version is 4.15.0-62-generic)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1690085

Title:
  Ryzen 1800X freeze - rcu_sched detected stalls on CPUs/tasks

Status in Linux:
  Confirmed
Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Hi,

  
  We aregetting various kernel crash on a pretty new config.
  We're using Ryzen 1800X CPU with X370 Gaming Pro Carbon MB (7A32V1) using 
latest BIOS available (1.52)

  We are running Ubuntu 17.04 (amd64), we've tried different kernel version, 
native one and releases from http://kernel.ubuntu.com/~kernel-ppa/mainline/ too.
  Tested kernel version:

  native 17.04 kernel
  4.10.15

  Issues are the same, we're getting random freeze on the machine.

  Here is kern.log entry when happening :

  May 10 22:41:56 dev2 kernel: [24366.186246] INFO: rcu_sched detected stalls 
on CPUs/tasks:
  May 10 22:41:56 dev2 kernel: [24366.187618]     0-...: (1 GPs behind) 
idle=49b/1/0 softirq=28561/28563 fqs=913449
  May 10 22:41:56 dev2 kernel: [24366.188977]     (detected by 12, t=1860207 
jiffies, g=10001, c=10000, q=4656)
  May 10 22:41:56 dev2 kernel: [24366.190344] Task dump for CPU 0:
  May 10 22:41:56 dev2 kernel: [24366.190345] swapper/0       R  running task   
     0     0      0 0x00000008
  May 10 22:41:56 dev2 kernel: [24366.190348] Call Trace:
  May 10 22:41:56 dev2 kernel: [24366.190354]  ? native_safe_halt+0x6/0x10
  May 10 22:41:56 dev2 kernel: [24366.190355]  ? default_idle+0x20/0xd0
  May 10 22:41:56 dev2 kernel: [24366.190358]  ? arch_cpu_idle+0xf/0x20
  May 10 22:41:56 dev2 kernel: [24366.190360]  ? default_idle_call+0x23/0x30
  May 10 22:41:56 dev2 kernel: [24366.190362]  ? do_idle+0x16f/0x200
  May 10 22:41:56 dev2 kernel: [24366.190364]  ? cpu_startup_entry+0x71/0x80
  May 10 22:41:56 dev2 kernel: [24366.190366]  ? rest_init+0x77/0x80
  May 10 22:41:56 dev2 kernel: [24366.190368]  ? start_kernel+0x464/0x485
  May 10 22:41:56 dev2 kernel: [24366.190369]  ? 
early_idt_handler_array+0x120/0x120
  May 10 22:41:56 dev2 kernel: [24366.190371]  ? 
x86_64_start_reservations+0x24/0x26
  May 10 22:41:56 dev2 kernel: [24366.190372]  ? x86_64_start_kernel+0x14d/0x170
  May 10 22:41:56 dev2 kernel: [24366.190373]  ? start_cpu+0x14/0x14
  May 10 22:44:56 dev2 kernel: [24546.188093] INFO: rcu_sched detected stalls 
on CPUs/tasks:
  May 10 22:44:56 dev2 kernel: [24546.189461]     0-...: (1 GPs behind) 
idle=49b/1/0 softirq=28561/28563 fqs=935027
  May 10 22:44:56 dev2 kernel: [24546.190823]     (detected by 14, t=1905212 
jiffies, g=10001, c=10000, q=4740)
  May 10 22:44:56 dev2 kernel: [24546.192191] Task dump for CPU 0:
  May 10 22:44:56 dev2 kernel: [24546.192192] swapper/0       R  running task   
     0     0      0 0x00000008
  May 10 22:44:56 dev2 kernel: [24546.192195] Call Trace:
  May 10 22:44:56 dev2 kernel: [24546.192199]  ? native_safe_halt+0x6/0x10
  May 10 22:44:56 dev2 kernel: [24546.192201]  ? default_idle+0x20/0xd0
  May 10 22:44:56 dev2 kernel: [24546.192203]  ? arch_cpu_idle+0xf/0x20
  May 10 22:44:56 dev2 kernel: [24546.192204]  ? default_idle_call+0x23/0x30
  May 10 22:44:56 dev2 kernel: [24546.192206]  ? do_idle+0x16f/0x200
  May 10 22:44:56 dev2 kernel: [24546.192208]  ? cpu_startup_entry+0x71/0x80
  May 10 22:44:56 dev2 kernel: [24546.192210]  ? rest_init+0x77/0x80
  May 10 22:44:56 dev2 kernel: [24546.192211]  ? start_kernel+0x464/0x485
  May 10 22:44:56 dev2 kernel: [24546.192213]  ? 
early_idt_handler_array+0x120/0x120
  May 10 22:44:56 dev2 kernel: [24546.192214]  ? 
x86_64_start_reservations+0x24/0x26
  May 10 22:44:56 dev2 kernel: [24546.192215]  ? x86_64_start_kernel+0x14d/0x170
  May 10 22:44:56 dev2 kernel: [24546.192217]  ? start_cpu+0x14/0x14

  Depending on the kernel version, we've got NMI watchdog errors related to CPU 
stuck (mentioning the CPU core id, which is random).
  Crash is happening randomly, but in general after some hours (3-4h).

  Now, we've installed kernel 4.11.0-041100-generic #201705041534 this morning 
and waiting for crash...
  For now, the machine is not "used", at least, it's not CPU stressed...

  
  Thanks
  --- 
  ApportVersion: 2.20.4-0ubuntu4
  Architecture: amd64
  DistroRelease: Ubuntu 17.04
  InstallationDate: Installed on 2017-05-09 (1 days ago)
  InstallationMedia: Ubuntu-Server 17.04 "Zesty Zapus" - Release amd64 
(20170412)
  Package: linux (not installed)
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=<set>
   LANG=fr_FR.UTF-8
   SHELL=/bin/bash
  Tags:  zesty
  Uname: Linux 4.11.0-041100-generic x86_64
  UnreportableReason: The running kernel is not an Ubuntu kernel
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  _MarkForUpload: True

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1690085/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to