Control: tags -1 + moreinfo Hi Olivier,
On Tue, May 04, 2021 at 10:01:17AM +0200, Olivier Monaco wrote: > Package: src:linux > Version: 5.10.19-1~bpo10+1 > Severity: important > > On a virtual machine running a NFS server the following kernel panic occurs: > 2021-05-04T02:28:21.051193+02:00 storage-t20 kernel: [1736623.921391] > ------------[ cut here ]------------ > 2021-05-04T02:28:21.051214+02:00 storage-t20 kernel: [1736623.921406] > refcount_t: addition on 0; use-after-free. > 2021-05-04T02:28:21.051215+02:00 storage-t20 kernel: [1736623.921416] > WARNING: CPU: 0 PID: 675 at lib/refcount.c:25 refcount_warn_saturate+0x6d/0xf0 > 2021-05-04T02:28:21.051216+02:00 storage-t20 kernel: [1736623.921417] Modules > linked in: binfmt_misc vsock_loopback vmw_vsock_virtio_transport_common > vmw_vsock_vmci_transport vsock intel_rapl_msr intel_rapl_common nfit > libnvdimm crc32_pclmul ghash_clmulni_intel aesni_intel libaes crypto_simd > cryptd glue_helper rapl vm > w_balloon vmwgfx joydev evdev serio_raw pcspkr ttm sg drm_kms_helper vmw_vmci > cec ac button nfsd auth_rpcgss nfs_acl lockd grace drm sunrpc fuse configfs > ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq libcrc32c > crc32c_generic dm_mod sd_mod t10_pi crc_t10dif crct10dif_generic ata_generic > crct10dif_pclmul > crct10dif_common crc32c_intel psmouse vmxnet3 ata_piix libata vmw_pvscsi > scsi_mod i2c_piix4 > 2021-05-04T02:28:21.051217+02:00 storage-t20 kernel: [1736623.921488] CPU: 0 > PID: 675 Comm: nfsd Not tainted 5.10.0-0.bpo.4-amd64 #1 Debian > 5.10.19-1~bpo10+1 > 2021-05-04T02:28:21.051218+02:00 storage-t20 kernel: [1736623.921488] > Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference > Platform, BIOS 6.00 12/12/2018 > 2021-05-04T02:28:21.051219+02:00 storage-t20 kernel: [1736623.921491] RIP: > 0010:refcount_warn_saturate+0x6d/0xf0 > 2021-05-04T02:28:21.051219+02:00 storage-t20 kernel: [1736623.921492] Code: > 05 d8 be 3f 01 01 e8 c3 0a 40 00 0f 0b c3 80 3d c8 be 3f 01 00 75 ce 48 c7 c7 > 30 6c 92 86 c6 05 b8 be 3f 01 01 e8 a4 0a 40 00 <0f> 0b c3 80 3d ab be 3f 01 > 00 75 af 48 c7 c7 08 6c 92 86 c6 05 9b > 2021-05-04T02:28:21.051220+02:00 storage-t20 kernel: [1736623.921493] RSP: > 0018:ffffb93f412b3c28 EFLAGS: 00010282 > 2021-05-04T02:28:21.051234+02:00 storage-t20 kernel: [1736623.921494] RAX: > 0000000000000000 RBX: ffff9c2c913a0f80 RCX: 0000000000000027 > 2021-05-04T02:28:21.051236+02:00 storage-t20 kernel: [1736623.921495] RDX: > 0000000000000027 RSI: ffff9c2d39e18a00 RDI: ffff9c2d39e18a08 > 2021-05-04T02:28:21.051237+02:00 storage-t20 kernel: [1736623.921495] RBP: > ffff9c2c96e4f2a4 R08: 0000000000000000 R09: c0000000ffff7fff > 2021-05-04T02:28:21.051238+02:00 storage-t20 kernel: [1736623.921496] R10: > 0000000000000001 R11: ffffb93f412b3a30 R12: ffff9c2c96e4f2a0 > 2021-05-04T02:28:21.051238+02:00 storage-t20 kernel: [1736623.921496] R13: > ffff9c2c375f5450 R14: ffff9c2cb4f9fde8 R15: ffffffff86f75300 > 2021-05-04T02:28:21.051244+02:00 storage-t20 kernel: [1736623.921497] FS: > 0000000000000000(0000) GS:ffff9c2d39e00000(0000) knlGS:0000000000000000 > 2021-05-04T02:28:21.051245+02:00 storage-t20 kernel: [1736623.921498] CS: > 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > 2021-05-04T02:28:21.051245+02:00 storage-t20 kernel: [1736623.921499] CR2: > 00007f424807d5b9 CR3: 0000000103084001 CR4: 00000000007706f0 > 2021-05-04T02:28:21.051246+02:00 storage-t20 kernel: [1736623.921522] PKRU: > 55555554 > 2021-05-04T02:28:21.051246+02:00 storage-t20 kernel: [1736623.921523] Call > Trace: > 2021-05-04T02:28:21.051246+02:00 storage-t20 kernel: [1736623.921546] > nfsd_break_deleg_cb+0xb5/0xc0 [nfsd] > 2021-05-04T02:28:21.051247+02:00 storage-t20 kernel: [1736623.921553] > __break_lease+0x148/0x500 > 2021-05-04T02:28:21.051249+02:00 storage-t20 kernel: [1736623.921564] ? > fill_pre_wcc+0x8f/0x180 [nfsd] > 2021-05-04T02:28:21.051250+02:00 storage-t20 kernel: [1736623.921566] > notify_change+0x196/0x4c0 > 2021-05-04T02:28:21.051250+02:00 storage-t20 kernel: [1736623.921575] ? > nfsd_setattr+0x2e6/0x470 [nfsd] > 2021-05-04T02:28:21.051250+02:00 storage-t20 kernel: [1736623.921586] > nfsd_setattr+0x2e6/0x470 [nfsd] > 2021-05-04T02:28:21.051251+02:00 storage-t20 kernel: [1736623.921597] > nfsd4_setattr+0x7b/0x140 [nfsd] > 2021-05-04T02:28:21.051251+02:00 storage-t20 kernel: [1736623.921611] > nfsd4_proc_compound+0x355/0x680 [nfsd] > 2021-05-04T02:28:21.051251+02:00 storage-t20 kernel: [1736623.921623] > nfsd_dispatch+0xd4/0x180 [nfsd] > 2021-05-04T02:28:21.051253+02:00 storage-t20 kernel: [1736623.921661] > svc_process_common+0x390/0x6c0 [sunrpc] > 2021-05-04T02:28:21.051253+02:00 storage-t20 kernel: [1736623.921680] ? > svc_recv+0x3c4/0x8a0 [sunrpc] > 2021-05-04T02:28:21.051254+02:00 storage-t20 kernel: [1736623.921688] ? > nfsd_svc+0x300/0x300 [nfsd] > 2021-05-04T02:28:21.051254+02:00 storage-t20 kernel: [1736623.921695] ? > nfsd_destroy+0x60/0x60 [nfsd] > 2021-05-04T02:28:21.051255+02:00 storage-t20 kernel: [1736623.921710] > svc_process+0xb7/0xf0 [sunrpc] > 2021-05-04T02:28:21.051255+02:00 storage-t20 kernel: [1736623.921734] > nfsd+0xe8/0x140 [nfsd] > 2021-05-04T02:28:21.051257+02:00 storage-t20 kernel: [1736623.921737] > kthread+0x116/0x130 > 2021-05-04T02:28:21.051258+02:00 storage-t20 kernel: [1736623.921738] ? > kthread_park+0x80/0x80 > 2021-05-04T02:28:21.051258+02:00 storage-t20 kernel: [1736623.921741] > ret_from_fork+0x1f/0x30 > 2021-05-04T02:28:21.051259+02:00 storage-t20 kernel: [1736623.921743] ---[ > end trace f6e153631af275dc ]--- > > It is followed by: > 2021-05-04T02:28:21.101162+02:00 storage-t20 kernel: [1736623.971161] > list_add corruption. prev->next should be next (ffff9c2d0875ecb8), but was > ffff9c2c913a0fe8. (prev=ffff9c2c913a0fe8). > 2021-05-04T02:28:21.101176+02:00 storage-t20 kernel: [1736623.971315] > ------------[ cut here ]------------ > 2021-05-04T02:28:21.101177+02:00 storage-t20 kernel: [1736623.971317] kernel > BUG at lib/list_debug.c:28! > 2021-05-04T02:28:21.101178+02:00 storage-t20 kernel: [1736623.971362] invalid > opcode: 0000 [#1] SMP NOPTI > 2021-05-04T02:28:21.101178+02:00 storage-t20 kernel: [1736623.971402] CPU: 1 > PID: 2435711 Comm: kworker/u256:5 Tainted: G W > 5.10.0-0.bpo.4-amd64 #1 Debian 5.10.19-1~bpo10+1 > 2021-05-04T02:28:21.101179+02:00 storage-t20 kernel: [1736623.971456] > Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference > Platform, BIOS 6.00 12/12/2018 > 2021-05-04T02:28:21.101180+02:00 storage-t20 kernel: [1736623.971499] > Workqueue: nfsd4_callbacks nfsd4_run_cb_work [nfsd] > 2021-05-04T02:28:21.101180+02:00 storage-t20 kernel: [1736623.971515] RIP: > 0010:__list_add_valid.cold.0+0x26/0x28 > 2021-05-04T02:28:21.101181+02:00 storage-t20 kernel: [1736623.971527] Code: > 7b 1c bf ff 48 89 d1 48 c7 c7 18 71 92 86 48 89 c2 e8 02 2a ff ff 0f 0b 48 89 > c1 4c 89 c6 48 c7 c7 70 71 92 86 e8 ee 29 ff ff <0f> 0b 48 89 fe 48 89 c2 48 > c7 c7 00 72 92 86 e8 da 29 ff ff 0f 0b > 2021-05-04T02:28:21.101181+02:00 storage-t20 kernel: [1736623.971564] RSP: > 0018:ffffb93f4075fe48 EFLAGS: 00010246 > 2021-05-04T02:28:21.101182+02:00 storage-t20 kernel: [1736623.971579] RAX: > 0000000000000075 RBX: ffff9c2c913a0fe8 RCX: 0000000000000000 > 2021-05-04T02:28:21.101182+02:00 storage-t20 kernel: [1736623.971594] RDX: > 0000000000000000 RSI: ffff9c2d39e58a00 RDI: ffff9c2d39e58a00 > 2021-05-04T02:28:21.101183+02:00 storage-t20 kernel: [1736623.971608] RBP: > ffff9c2c913a1018 R08: 0000000000000000 R09: c0000000ffff7fff > 2021-05-04T02:28:21.101183+02:00 storage-t20 kernel: [1736623.971623] R10: > 0000000000000001 R11: ffffb93f4075fc58 R12: ffff9c2d0875ec00 > 2021-05-04T02:28:21.101183+02:00 storage-t20 kernel: [1736623.971637] R13: > ffff9c2c913a0fe8 R14: ffff9c2d0875ecb8 R15: ffff9c2c913a1050 > 2021-05-04T02:28:21.101184+02:00 storage-t20 kernel: [1736623.971653] FS: > 0000000000000000(0000) GS:ffff9c2d39e40000(0000) knlGS:0000000000000000 > 2021-05-04T02:28:21.101184+02:00 storage-t20 kernel: [1736623.971684] CS: > 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > 2021-05-04T02:28:21.101184+02:00 storage-t20 kernel: [1736623.971735] CR2: > 00007f8d98002698 CR3: 0000000103904005 CR4: 00000000007706e0 > 2021-05-04T02:28:21.101185+02:00 storage-t20 kernel: [1736623.971774] PKRU: > 55555554 > 2021-05-04T02:28:21.101185+02:00 storage-t20 kernel: [1736623.971781] Call > Trace: > 2021-05-04T02:28:21.101186+02:00 storage-t20 kernel: [1736623.971805] > nfsd4_cb_recall_prepare+0x2aa/0x2f0 [nfsd] > 2021-05-04T02:28:21.101186+02:00 storage-t20 kernel: [1736623.971829] > nfsd4_run_cb_work+0xe9/0x150 [nfsd] > 2021-05-04T02:28:21.101186+02:00 storage-t20 kernel: [1736623.971843] > process_one_work+0x1aa/0x340 > 2021-05-04T02:28:21.101187+02:00 storage-t20 kernel: [1736623.971855] ? > create_worker+0x1a0/0x1a0 > 2021-05-04T02:28:21.101187+02:00 storage-t20 kernel: [1736623.971865] > worker_thread+0x30/0x390 > 2021-05-04T02:28:21.101188+02:00 storage-t20 kernel: [1736623.971875] ? > create_worker+0x1a0/0x1a0 > 2021-05-04T02:28:21.101188+02:00 storage-t20 kernel: [1736623.972279] > kthread+0x116/0x130 > 2021-05-04T02:28:21.101188+02:00 storage-t20 kernel: [1736623.972663] ? > kthread_park+0x80/0x80 > 2021-05-04T02:28:21.101189+02:00 storage-t20 kernel: [1736623.973043] > ret_from_fork+0x1f/0x30 > 2021-05-04T02:28:21.101189+02:00 storage-t20 kernel: [1736623.973411] Modules > linked in: binfmt_misc vsock_loopback vmw_vsock_virtio_transport_common > vmw_vsock_vmci_transport vsock intel_rapl_msr intel_rapl_common nfit > libnvdimm crc32_pclmul ghash_clmulni_intel aesni_intel libaes crypto_simd > cryptd glue_helper rapl vmw_balloon vmwgfx joydev evdev serio_raw pcspkr ttm > sg drm_kms_helper vmw_vmci cec ac button nfsd auth_rpcgss nfs_acl lockd grace > drm sunrpc fuse configfs ip_tables x_tables autofs4 btrfs blake2b_generic xor > raid6_pq libcrc32c crc32c_generic dm_mod sd_mod t10_pi crc_t10dif > crct10dif_generic ata_generic crct10dif_pclmul crct10dif_common crc32c_intel > psmouse vmxnet3 ata_piix libata vmw_pvscsi scsi_mod i2c_piix4 > 2021-05-04T02:28:21.101190+02:00 storage-t20 kernel: [1736623.976175] ---[ > end trace f6e153631af275dd ]--- > > We are running a VMware vSphere platform running 9 groups of virtual > machines. Each group include a VM with NFS for file sharing and 3 VM with NFS > clients, so we are running 9 independent file servers. This issue occured on > 2 different file servers with the same kernel version and the same error. > There is no direct link between the two servers except the fact they are > running the same software, on the same hadware for the same pupose. > > It also occured earlier 4 times on 3 different servers which was running > kernel 5.10.13-1~bpo10+1 (package linux-image-5.10.0-0.bpo.3-amd64). >From the above report I suspect you have not a easy way to trigger the issue right? Did you see the issue as well with the most current version in buster-backports, 5.10.24-1~bpo10+1. Ideally I think this issue should be just be forwarded upstream, but keep us in the loop accordingly, could you do that? I did not not found immediately something similarly on https://lore.kernel.org/linux-nfs/ (a recent one about doing inter-server copy, but that is/looks different here). So this would be, mailing "J. Bruce Fields" <bfie...@fieldses.org> (supporter:KERNEL NFSD, SUNRPC, AND LOCKD SERVERS) Chuck Lever <chuck.le...@oracle.com> (supporter:KERNEL NFSD, SUNRPC, AND LOCKD SERVERS) linux-...@vger.kernel.org (open list:KERNEL NFSD, SUNRPC, AND LOCKD SERVERS) linux-ker...@vger.kernel.org (open list) Regards, Salvatore