Public bug reported:

After upgrading to jammy kernel 5.15.0-144-generic we encountered a
serious regression when the weekly fstrim timer ran.

This bug was introduced by commit "md/raid10: fix missing discard IO accounting"
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=4a05f7ae33716d996c5ce56478a36a3ede1d76f2
which was backported to all stable kernels and became part of 5.15.181

The issue was discovered earlier upstream[1] and also in Debian[2],
which resulted in a fix being added to the Debian kernel and
subsequently into 6.1. However the missing patch[3] did not make it into
the 5.15-stable kernel triggering the regression also in Ubuntu jammy.


[1] 
https://lists.linaro.org/archives/list/[email protected]/thread/TM2PPS3XKE6M5H2FW63MLZV2T7HTM3QJ/
[2] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1104460
[3] 
https://lore.kernel.org/all/[email protected]/


dmesg:

kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
kernel: #PF: supervisor instruction fetch in kernel mode
kernel: #PF: error_code(0x0010) - not-present page
kernel: PGD 0 P4D 0 
kernel: Oops: 0010 [#1] SMP PTI
kernel: CPU: 5 PID: 784107 Comm: fstrim Not tainted 5.15.0-144-generic 
#157-Ubuntu
kernel: Hardware name: FUJITSU /D3417-B2, BIOS V5.0.0.12 R1.27.0.SR.1 for 
D3417-B2x               06/10/2020
kernel: RIP: 0010:0x0
kernel: Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
kernel: RSP: 0018:ffffb576409c7858 EFLAGS: 00010206
kernel: RAX: 0000000000000000 RBX: 0000000000092800 RCX: 0000000000000001
kernel: RDX: ffff8e7e012426f0 RSI: 0000000000000000 RDI: 0000000000092800
kernel: RBP: ffffb576409c78c8 R08: ffff8e884ec966c0 R09: ffff8e7e07c6b050
kernel: R10: 0000000000002ecb R11: 00000000000030c8 R12: 0000000000092c00
kernel: R13: 0000000000000400 R14: ffff8e7e01242708 R15: ffff8e7e10743400
kernel: FS:  00007f6fff9f0800(0000) GS:ffff8e8cee540000(0000) 
knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: ffffffffffffffd6 CR3: 00000001090f6005 CR4: 00000000003706e0
kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
kernel: Call Trace:
kernel:  <TASK>
kernel:  mempool_alloc+0x61/0x1b0
kernel:  ? __kmalloc+0x179/0x330
kernel:  bio_alloc_bioset+0x9d/0x370
kernel:  ? r10bio_pool_alloc+0x26/0x30 [raid10]
kernel:  bio_clone_fast+0x1f/0x90
kernel:  md_account_bio+0x42/0x80
kernel:  raid10_handle_discard+0x56f/0x6b0 [raid10]
kernel:  raid10_make_request+0x147/0x180 [raid10]
kernel:  md_handle_request+0x12a/0x1b0
kernel:  ? submit_bio_checks+0x1a5/0x580
kernel:  md_submit_bio+0x76/0xc0
kernel:  __submit_bio+0x1a2/0x220
kernel:  ? mempool_alloc_slab+0x17/0x20
kernel:  ? mempool_alloc+0x61/0x1b0
kernel:  ? schedule_timeout+0x91/0x140
kernel:  __submit_bio_noacct+0x85/0x200
kernel:  submit_bio_noacct+0x4e/0x120
kernel:  ? __cond_resched+0x1a/0x60
kernel:  submit_bio+0x4a/0x130
kernel:  submit_bio_wait+0x5a/0xc0
kernel:  blkdev_issue_discard+0x7e/0xd0
kernel:  ext4_try_to_trim_range+0x2db/0x520
kernel:  ? ext4_mb_load_buddy_gfp+0x91/0x3e0
kernel:  ext4_trim_fs+0x313/0x510
kernel:  __ext4_ioctl+0x82c/0xef0
kernel:  ext4_ioctl+0xe/0x20
kernel:  __x64_sys_ioctl+0x92/0xd0
kernel:  x64_sys_call+0x1e5f/0x1fa0
kernel:  do_syscall_64+0x56/0xb0
kernel:  entry_SYSCALL_64_after_hwframe+0x6c/0xd6
kernel: RIP: 0033:0x7f6fffc0994f
kernel: Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 
44 24 08 48 8d 44 24 20 48 >
kernel: RSP: 002b:00007ffdce979c30 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
kernel: RAX: ffffffffffffffda RBX: 00007ffdce979d80 RCX: 00007f6fffc0994f
kernel: RDX: 00007ffdce979ca0 RSI: 00000000c0185879 RDI: 0000000000000003
kernel: RBP: 0000558436acccb0 R08: 0000558436acccb0 R09: 0000000000000000
kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
kernel: R13: 0000558436accfa0 R14: 0000558436acce80 R15: 0000558436acce80
kernel:  </TASK>
kernel: Modules linked in: tls tcp_diag udp_diag inet_diag bridge stp llc 
nft_counter nft_chain_nat nf_nat >
kernel:  xhci_pci_renesas wmi video
kernel: CR2: 0000000000000000
kernel: ---[ end trace db9334d27f904581 ]---
kernel: RIP: 0010:0x0
kernel: Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
kernel: RSP: 0018:ffffb576409c7858 EFLAGS: 00010206
kernel: RAX: 0000000000000000 RBX: 0000000000092800 RCX: 0000000000000001
kernel: RDX: ffff8e7e012426f0 RSI: 0000000000000000 RDI: 0000000000092800
kernel: RBP: ffffb576409c78c8 R08: ffff8e884ec966c0 R09: ffff8e7e07c6b050
kernel: R10: 0000000000002ecb R11: 00000000000030c8 R12: 0000000000092c00
kernel: R13: 0000000000000400 R14: ffff8e7e01242708 R15: ffff8e7e10743400
kernel: FS:  00007f6fff9f0800(0000) GS:ffff8e8cee540000(0000) 
knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: ffffffffffffffd6 CR3: 00000001090f6005 CR4: 00000000003706e0
kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
kernel: BUG: unable to handle page fault for address: ffffb57600000010

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: Confirmed

** Summary changed:

- {Regression] kernel 5.15.0-144-generic -  discard broken with RAID10 
+ [Regression] kernel 5.15.0-144-generic -  discard broken with RAID10

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2117395

Title:
  [Regression] kernel 5.15.0-144-generic -  discard broken with RAID10

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  After upgrading to jammy kernel 5.15.0-144-generic we encountered a
  serious regression when the weekly fstrim timer ran.

  This bug was introduced by commit "md/raid10: fix missing discard IO 
accounting"
  
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=4a05f7ae33716d996c5ce56478a36a3ede1d76f2
  which was backported to all stable kernels and became part of 5.15.181

  The issue was discovered earlier upstream[1] and also in Debian[2],
  which resulted in a fix being added to the Debian kernel and
  subsequently into 6.1. However the missing patch[3] did not make it
  into the 5.15-stable kernel triggering the regression also in Ubuntu
  jammy.

  
  [1] 
https://lists.linaro.org/archives/list/[email protected]/thread/TM2PPS3XKE6M5H2FW63MLZV2T7HTM3QJ/
  [2] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1104460
  [3] 
https://lore.kernel.org/all/[email protected]/

  
  dmesg:

  kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
  kernel: #PF: supervisor instruction fetch in kernel mode
  kernel: #PF: error_code(0x0010) - not-present page
  kernel: PGD 0 P4D 0 
  kernel: Oops: 0010 [#1] SMP PTI
  kernel: CPU: 5 PID: 784107 Comm: fstrim Not tainted 5.15.0-144-generic 
#157-Ubuntu
  kernel: Hardware name: FUJITSU /D3417-B2, BIOS V5.0.0.12 R1.27.0.SR.1 for 
D3417-B2x               06/10/2020
  kernel: RIP: 0010:0x0
  kernel: Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
  kernel: RSP: 0018:ffffb576409c7858 EFLAGS: 00010206
  kernel: RAX: 0000000000000000 RBX: 0000000000092800 RCX: 0000000000000001
  kernel: RDX: ffff8e7e012426f0 RSI: 0000000000000000 RDI: 0000000000092800
  kernel: RBP: ffffb576409c78c8 R08: ffff8e884ec966c0 R09: ffff8e7e07c6b050
  kernel: R10: 0000000000002ecb R11: 00000000000030c8 R12: 0000000000092c00
  kernel: R13: 0000000000000400 R14: ffff8e7e01242708 R15: ffff8e7e10743400
  kernel: FS:  00007f6fff9f0800(0000) GS:ffff8e8cee540000(0000) 
knlGS:0000000000000000
  kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  kernel: CR2: ffffffffffffffd6 CR3: 00000001090f6005 CR4: 00000000003706e0
  kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  kernel: Call Trace:
  kernel:  <TASK>
  kernel:  mempool_alloc+0x61/0x1b0
  kernel:  ? __kmalloc+0x179/0x330
  kernel:  bio_alloc_bioset+0x9d/0x370
  kernel:  ? r10bio_pool_alloc+0x26/0x30 [raid10]
  kernel:  bio_clone_fast+0x1f/0x90
  kernel:  md_account_bio+0x42/0x80
  kernel:  raid10_handle_discard+0x56f/0x6b0 [raid10]
  kernel:  raid10_make_request+0x147/0x180 [raid10]
  kernel:  md_handle_request+0x12a/0x1b0
  kernel:  ? submit_bio_checks+0x1a5/0x580
  kernel:  md_submit_bio+0x76/0xc0
  kernel:  __submit_bio+0x1a2/0x220
  kernel:  ? mempool_alloc_slab+0x17/0x20
  kernel:  ? mempool_alloc+0x61/0x1b0
  kernel:  ? schedule_timeout+0x91/0x140
  kernel:  __submit_bio_noacct+0x85/0x200
  kernel:  submit_bio_noacct+0x4e/0x120
  kernel:  ? __cond_resched+0x1a/0x60
  kernel:  submit_bio+0x4a/0x130
  kernel:  submit_bio_wait+0x5a/0xc0
  kernel:  blkdev_issue_discard+0x7e/0xd0
  kernel:  ext4_try_to_trim_range+0x2db/0x520
  kernel:  ? ext4_mb_load_buddy_gfp+0x91/0x3e0
  kernel:  ext4_trim_fs+0x313/0x510
  kernel:  __ext4_ioctl+0x82c/0xef0
  kernel:  ext4_ioctl+0xe/0x20
  kernel:  __x64_sys_ioctl+0x92/0xd0
  kernel:  x64_sys_call+0x1e5f/0x1fa0
  kernel:  do_syscall_64+0x56/0xb0
  kernel:  entry_SYSCALL_64_after_hwframe+0x6c/0xd6
  kernel: RIP: 0033:0x7f6fffc0994f
  kernel: Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 
89 44 24 08 48 8d 44 24 20 48 >
  kernel: RSP: 002b:00007ffdce979c30 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
  kernel: RAX: ffffffffffffffda RBX: 00007ffdce979d80 RCX: 00007f6fffc0994f
  kernel: RDX: 00007ffdce979ca0 RSI: 00000000c0185879 RDI: 0000000000000003
  kernel: RBP: 0000558436acccb0 R08: 0000558436acccb0 R09: 0000000000000000
  kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
  kernel: R13: 0000558436accfa0 R14: 0000558436acce80 R15: 0000558436acce80
  kernel:  </TASK>
  kernel: Modules linked in: tls tcp_diag udp_diag inet_diag bridge stp llc 
nft_counter nft_chain_nat nf_nat >
  kernel:  xhci_pci_renesas wmi video
  kernel: CR2: 0000000000000000
  kernel: ---[ end trace db9334d27f904581 ]---
  kernel: RIP: 0010:0x0
  kernel: Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
  kernel: RSP: 0018:ffffb576409c7858 EFLAGS: 00010206
  kernel: RAX: 0000000000000000 RBX: 0000000000092800 RCX: 0000000000000001
  kernel: RDX: ffff8e7e012426f0 RSI: 0000000000000000 RDI: 0000000000092800
  kernel: RBP: ffffb576409c78c8 R08: ffff8e884ec966c0 R09: ffff8e7e07c6b050
  kernel: R10: 0000000000002ecb R11: 00000000000030c8 R12: 0000000000092c00
  kernel: R13: 0000000000000400 R14: ffff8e7e01242708 R15: ffff8e7e10743400
  kernel: FS:  00007f6fff9f0800(0000) GS:ffff8e8cee540000(0000) 
knlGS:0000000000000000
  kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  kernel: CR2: ffffffffffffffd6 CR3: 00000001090f6005 CR4: 00000000003706e0
  kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  kernel: BUG: unable to handle page fault for address: ffffb57600000010

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2117395/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to