Package: linux
Severity: important
Tags: patch
X-Debbugs-Cc: [email protected]

Package: linux
Version: 6.12.86-1
Severity: important
Tags: upstream

netfs_consume_read_data() calls local_bh_enable() in hard IRQ context
when reached via the cachefiles async read completion path.

The call chain is:

  nvme_irq                              [hard IRQ]
    blk_mq_end_request_batch
      iomap_dio_bio_end_io
        cachefiles_read_complete        [cachefiles]
          netfs_read_subreq_terminated  [netfs]
            netfs_consume_read_data     [netfs]
              __local_bh_enable_ip      *** WARNING ***

This fires a WARN_ON at kernel/softirq.c:386 because local_bh_enable()
is called while in hard IRQ context (PID 0, swapper, inside <IRQ>).

The issue is that netfs_consume_read_data() assumes it runs in process
or softirq context, but cachefiles_read_complete() can be called
directly from block layer IRQ completion (iomap_dio_bio_end_io), which
runs in hard IRQ context.

This triggers ~6 seconds after boot once NFS mounts with fsc come up
and the first cached reads complete from the NVMe backing store.

I have only seen this once so far. The system continued running but
cachefilesd appears to have died shortly after (possibly related).

Full trace:

  ------------[ cut here ]------------
  WARNING: CPU: 8 PID: 0 at kernel/softirq.c:386 __local_bh_enable_ip+0x4c/0x70
  CPU: 8 UID: 0 PID: 0 Comm: swapper/8 Not tainted 6.12.86+deb13-amd64 #1  
Debian 6.12.86-1
  Hardware name: LENOVO 30GUS5KF00/1064, BIOS S0IKT61A 03/11/2024
  RIP: 0010:__local_bh_enable_ip+0x4c/0x70
  Code: 05 c0 9c f6 77 ff ff 74 25 65 ff 0d b6 9c f6 77 65 8b 05 af 9c f6 77 85 
c0 74 05 c3 cc cc cc cc 0f 1f 44 00 00 c3 cc cc cc cc <0f> 0b eb c3 65 66 83 3d 
af 9c f6 77 00 74 d0 e8 60 f9 ff ff eb c9
  RSP: 0018:ffffd1e1c03c0d50 EFLAGS: 00010002
  RAX: dead000000000122 RBX: ffff8f34dea9c240 RCX: ffff8f35e515c8f0
  RDX: ffff8f35e515c8f0 RSI: 0000000000000201 RDI: ffffffffc1411ede
  RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
  R10: 0000000000000008 R11: 0000000000000001 R12: ffff8f35e515ca68
  R13: fffff83c51b57700 R14: ffff8f35e515ca68 R15: ffff8f34dea9c240
  FS:  0000000000000000(0000) GS:ffff8f53bec00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 000000001378d000 CR3: 00000003c0ae6004 CR4: 0000000000f72ef0
  PKRU: 55555554
  Call Trace:
   <IRQ>
   netfs_consume_read_data.isra.0+0x53e/0xb80 [netfs]
   ? __pfx_cachefiles_read_complete+0x10/0x10 [cachefiles]
   netfs_read_subreq_terminated+0x2ab/0x3e0 [netfs]
   cachefiles_read_complete+0x42/0x110 [cachefiles]
   iomap_dio_bio_end_io+0xed/0x170
   blk_mq_end_request_batch+0x100/0x4b0
   ? __iommu_dma_unmap+0x220/0x2d0
   nvme_irq+0x83/0x90 [nvme]
   ? __pfx_nvme_pci_complete_batch+0x10/0x10 [nvme]
   __handle_irq_event_percpu+0x47/0x190
   handle_irq_event+0x38/0x80
   handle_edge_irq+0x8b/0x230
   __common_interrupt+0x42/0xe0
   common_interrupt+0x80/0xa0
   </IRQ>
   <TASK>
   asm_common_interrupt+0x26/0x40
  RIP: 0010:cpuidle_enter_state+0xc6/0x420
   cpuidle_enter+0x2d/0x40
   do_idle+0x142/0x2a0
   cpu_startup_entry+0x29/0x30
   start_secondary+0x11e/0x140
   common_startup_64+0x13e/0x141
   </TASK>
  ---[ end trace 0000000000000000 ]---

Related bugs:
  - #1134555 cachefilesd: silently exits when secctx is set on
    non-SELinux systems.
  - #1136748 cachefilesd: ships no native systemd unit; daemon crashes
    go undetected. The lack of Restart=on-failure means this kernel
    warning silently kills the cache for the remainder of uptime.

Configuration:
  - NFS v4 mounts with fsc (cachefilesd)
  - Cache backing store on local NVMe (XFS)
  - cachefilesd 0.10.10

-- System Information:
Architecture: amd64
Debian Release: 13.4 (trixie)
Kernel: 6.12.86+deb13-amd64


-- System Information:
Debian Release: 13.4
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500, 'stable')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 6.12.86+deb13-amd64 (SMP w/32 CPU threads; PREEMPT)
Kernel taint flags: TAINT_WARN
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Reply via email to