[Kernel-packages] [Bug 2002748] [NEW] Unable to handle page fault under heavy M.2 load

Adam Petrycki Thu, 12 Jan 2023 14:38:38 -0800

Public bug reported:

Under heavy disk usage, such as backing up the disk to another drive, I
get the error:


[137133.774493] BUG: unable to handle page fault for address: 0000000041000064
[137133.774507] #PF: supervisor read access in kernel mode
[137133.774512] #PF: error_code(0x0000) - not-present page
[137133.774515] PGD 0 P4D 0
[137133.774520] Oops: 0000 [#1] SMP PTI
[137133.774525] CPU: 1 PID: 104 Comm: kswapd0 Tainted: G           OE     
5.15.0-46-generic #49-Ubuntu
[137133.774531] Hardware name: HARDKERNEL ODROID-H2/ODROID-H2, BIOS 5.13 
04/27/2020
[137133.774535] RIP: 0010:dqput.part.0+0x47/0x1f0
[137133.774545] Code: 53 48 89 fb 48 c7 c7 68 e4 dd 95 4c 8d 6b 30 48 83 ec 08 
8b 15 1a b1 e0 01 e8 65 b8 23 00 48 c7 c7 00 97 40 95 e8 69 45 97 00 <8b> 43 64 
83 f8 01 0f 8f 90 00 00 00 48 8b 83 80 00 00 00 a8 01 75
[137133.774551] RSP: 0018:ffffb166404879e0 EFLAGS: 00010246
[137133.774556] RAX: 0000000000000000 RBX: 0000000041000000 RCX: 
0000000000038ee0
[137133.774560] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 
ffffffff95409700
[137133.774564] RBP: ffffb16640487a10 R08: 0000000000000002 R09: 
0000000000000000
[137133.774567] R10: 0000000000000001 R11: 0000000000000000 R12: 
ffffffff95409700
[137133.774571] R13: 0000000041000030 R14: ffff98aeb0d14870 R15: 
000000000010f224
[137133.774575] FS:  0000000000000000(0000) GS:ffff98b5dfc80000(0000) 
knlGS:0000000000000000
[137133.774579] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[137133.774583] CR2: 0000000041000064 CR3: 0000000746010000 CR4: 
0000000000350ee0
[137133.774587] Call Trace:
[137133.774591]  <TASK>
[137133.774596]  __dquot_drop+0x91/0xc0
[137133.774602]  dquot_drop+0x53/0x60
[137133.774606]  ext4_clear_inode+0x44/0xb0
[137133.774613]  ext4_evict_inode+0x86/0x670
[137133.774618]  evict+0xcf/0x190
[137133.774624]  prune_icache_sb+0x81/0xc0
[137133.774629]  super_cache_scan+0x169/0x200
[137133.774635]  do_shrink_slab+0x155/0x2c0
[137133.774641]  shrink_slab_memcg+0xcf/0x1e0
[137133.774646]  shrink_slab+0x10a/0x120
[137133.774651]  ? shrink_slab+0x10a/0x120
[137133.774655]  shrink_node_memcgs+0x188/0x1d0
[137133.774660]  shrink_node+0x15d/0x5c0
[137133.774666]  balance_pgdat+0x36e/0x800
[137133.774670]  ? try_to_del_timer_sync+0x22/0x90
[137133.774677]  kswapd+0x10c/0x1c0
[137133.774682]  ? balance_pgdat+0x800/0x800
[137133.774686]  kthread+0x127/0x150
[137133.774691]  ? set_kthread_struct+0x50/0x50
[137133.774696]  ret_from_fork+0x1f/0x30
[137133.774703]  </TASK>
[137133.774706] Modules linked in: tcp_diag udp_diag inet_diag xt_nat xt_tcpudp 
veth tls binfmt_misc xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat 
nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user 
xfrm_algo nft_counter xt_addrtype nft_compat nf_tables nfnetlink br_netfilter 
bridge cfg80211 8021q garp mrp stp llc overlay intel_rapl_msr intel_rapl_common 
mei_hdcp nls_iso8859_1 intel_pmc_bxt intel_telemetry_pltdrv intel_punit_ipc 
intel_telemetry_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel 
kvm rapl intel_cstate ee1004 ucsi_acpi r8125(OE) mei_me typec_ucsi mei typec 
mac_hid sch_fq_codel dm_multipath ipmi_devintf scsi_dh_rdac ipmi_msghandler 
scsi_dh_emc scsi_dh_alua ramoops mtd msr reed_solomon pstore_blk pstore_zone 
efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress 
raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor 
raid6_pq libcrc32c raid1 raid0 multipath linear uas usb_storage i915 
i2c_algo_bit
  ttm
[137133.774796]  drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops 
cec rc_core i2c_i801 xhci_pci crct10dif_pclmul crc32_pclmul ghash_clmulni_intel 
intel_lpss_pci aesni_intel drm r8169 nvme ahci i2c_smbus libahci intel_lpss 
crypto_simd idma64 xhci_pci_renesas nvme_core cryptd realtek video 
pinctrl_geminilake
[137133.774849] CR2: 0000000041000064
[137133.774854] ---[ end trace 85a590ce733b1c95 ]---

The process and exact message changes, but it's ext4 related.

Description:    Ubuntu 22.04.1 LTS
Release:        22.04
(GNU/Linux 5.15.0-57-generic x86_64)
ODROID-H2+
Two Team Group CX2 2.5" 2TB SATA III 3D TLC Internal Solid State Drive (SSD) 
T253X6002T0C101 in RAID1 as a storage drive (not root)
Team Group MP33 M.2 2280 1TB PCIe 3.0 x4 with NVMe 1.3 3D NAND Internal Solid 
State Drive (SSD) TM8FP6001T0C101 (which seems to be the problem)
Google Coral
smartctl shows no errors
It can be days or weeks before I see the issue unless I do something like a 
backup that has high utilization for an extended period of time.

I would expect the computer to run smoothly.

Instead, it throws the message and crawls to a hault as disk access stops 
working.
r8125 driver is the only non-distro module

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: Incomplete

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2002748

Title:
  Unable to handle page fault under heavy M.2 load

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  Under heavy disk usage, such as backing up the disk to another drive,
  I get the error:

  [137133.774493] BUG: unable to handle page fault for address: 0000000041000064
  [137133.774507] #PF: supervisor read access in kernel mode
  [137133.774512] #PF: error_code(0x0000) - not-present page
  [137133.774515] PGD 0 P4D 0
  [137133.774520] Oops: 0000 [#1] SMP PTI
  [137133.774525] CPU: 1 PID: 104 Comm: kswapd0 Tainted: G           OE     
5.15.0-46-generic #49-Ubuntu
  [137133.774531] Hardware name: HARDKERNEL ODROID-H2/ODROID-H2, BIOS 5.13 
04/27/2020
  [137133.774535] RIP: 0010:dqput.part.0+0x47/0x1f0
  [137133.774545] Code: 53 48 89 fb 48 c7 c7 68 e4 dd 95 4c 8d 6b 30 48 83 ec 
08 8b 15 1a b1 e0 01 e8 65 b8 23 00 48 c7 c7 00 97 40 95 e8 69 45 97 00 <8b> 43 
64 83 f8 01 0f 8f 90 00 00 00 48 8b 83 80 00 00 00 a8 01 75
  [137133.774551] RSP: 0018:ffffb166404879e0 EFLAGS: 00010246
  [137133.774556] RAX: 0000000000000000 RBX: 0000000041000000 RCX: 
0000000000038ee0
  [137133.774560] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 
ffffffff95409700
  [137133.774564] RBP: ffffb16640487a10 R08: 0000000000000002 R09: 
0000000000000000
  [137133.774567] R10: 0000000000000001 R11: 0000000000000000 R12: 
ffffffff95409700
  [137133.774571] R13: 0000000041000030 R14: ffff98aeb0d14870 R15: 
000000000010f224
  [137133.774575] FS:  0000000000000000(0000) GS:ffff98b5dfc80000(0000) 
knlGS:0000000000000000
  [137133.774579] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [137133.774583] CR2: 0000000041000064 CR3: 0000000746010000 CR4: 
0000000000350ee0
  [137133.774587] Call Trace:
  [137133.774591]  <TASK>
  [137133.774596]  __dquot_drop+0x91/0xc0
  [137133.774602]  dquot_drop+0x53/0x60
  [137133.774606]  ext4_clear_inode+0x44/0xb0
  [137133.774613]  ext4_evict_inode+0x86/0x670
  [137133.774618]  evict+0xcf/0x190
  [137133.774624]  prune_icache_sb+0x81/0xc0
  [137133.774629]  super_cache_scan+0x169/0x200
  [137133.774635]  do_shrink_slab+0x155/0x2c0
  [137133.774641]  shrink_slab_memcg+0xcf/0x1e0
  [137133.774646]  shrink_slab+0x10a/0x120
  [137133.774651]  ? shrink_slab+0x10a/0x120
  [137133.774655]  shrink_node_memcgs+0x188/0x1d0
  [137133.774660]  shrink_node+0x15d/0x5c0
  [137133.774666]  balance_pgdat+0x36e/0x800
  [137133.774670]  ? try_to_del_timer_sync+0x22/0x90
  [137133.774677]  kswapd+0x10c/0x1c0
  [137133.774682]  ? balance_pgdat+0x800/0x800
  [137133.774686]  kthread+0x127/0x150
  [137133.774691]  ? set_kthread_struct+0x50/0x50
  [137133.774696]  ret_from_fork+0x1f/0x30
  [137133.774703]  </TASK>
  [137133.774706] Modules linked in: tcp_diag udp_diag inet_diag xt_nat 
xt_tcpudp veth tls binfmt_misc xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat 
nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user 
xfrm_algo nft_counter xt_addrtype nft_compat nf_tables nfnetlink br_netfilter 
bridge cfg80211 8021q garp mrp stp llc overlay intel_rapl_msr intel_rapl_common 
mei_hdcp nls_iso8859_1 intel_pmc_bxt intel_telemetry_pltdrv intel_punit_ipc 
intel_telemetry_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel 
kvm rapl intel_cstate ee1004 ucsi_acpi r8125(OE) mei_me typec_ucsi mei typec 
mac_hid sch_fq_codel dm_multipath ipmi_devintf scsi_dh_rdac ipmi_msghandler 
scsi_dh_emc scsi_dh_alua ramoops mtd msr reed_solomon pstore_blk pstore_zone 
efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress 
raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor 
raid6_pq libcrc32c raid1 raid0 multipath linear uas usb_storage i915 i2c_algo_b
 it ttm
  [137133.774796]  drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops 
cec rc_core i2c_i801 xhci_pci crct10dif_pclmul crc32_pclmul ghash_clmulni_intel 
intel_lpss_pci aesni_intel drm r8169 nvme ahci i2c_smbus libahci intel_lpss 
crypto_simd idma64 xhci_pci_renesas nvme_core cryptd realtek video 
pinctrl_geminilake
  [137133.774849] CR2: 0000000041000064
  [137133.774854] ---[ end trace 85a590ce733b1c95 ]---

  The process and exact message changes, but it's ext4 related.

  Description:    Ubuntu 22.04.1 LTS
  Release:        22.04
  (GNU/Linux 5.15.0-57-generic x86_64)
  ODROID-H2+
  Two Team Group CX2 2.5" 2TB SATA III 3D TLC Internal Solid State Drive (SSD) 
T253X6002T0C101 in RAID1 as a storage drive (not root)
  Team Group MP33 M.2 2280 1TB PCIe 3.0 x4 with NVMe 1.3 3D NAND Internal Solid 
State Drive (SSD) TM8FP6001T0C101 (which seems to be the problem)
  Google Coral
  smartctl shows no errors
  It can be days or weeks before I see the issue unless I do something like a 
backup that has high utilization for an extended period of time.

  I would expect the computer to run smoothly.

  Instead, it throws the message and crawls to a hault as disk access stops 
working.
  r8125 driver is the only non-distro module

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2002748/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 2002748] [NEW] Unable to handle page fault under heavy M.2 load

Reply via email to