Control: tags -1 + moreinfo

Hi Olivier,

On Tue, May 04, 2021 at 10:01:17AM +0200, Olivier Monaco wrote:
> Package: src:linux
> Version: 5.10.19-1~bpo10+1
> Severity: important
> 
> On a virtual machine running a NFS server the following kernel panic occurs:
> 2021-05-04T02:28:21.051193+02:00 storage-t20 kernel: [1736623.921391] 
> ------------[ cut here ]------------
> 2021-05-04T02:28:21.051214+02:00 storage-t20 kernel: [1736623.921406] 
> refcount_t: addition on 0; use-after-free.
> 2021-05-04T02:28:21.051215+02:00 storage-t20 kernel: [1736623.921416] 
> WARNING: CPU: 0 PID: 675 at lib/refcount.c:25 refcount_warn_saturate+0x6d/0xf0
> 2021-05-04T02:28:21.051216+02:00 storage-t20 kernel: [1736623.921417] Modules 
> linked in: binfmt_misc vsock_loopback vmw_vsock_virtio_transport_common 
> vmw_vsock_vmci_transport vsock intel_rapl_msr intel_rapl_common nfit 
> libnvdimm crc32_pclmul ghash_clmulni_intel aesni_intel libaes crypto_simd 
> cryptd glue_helper rapl vm
> w_balloon vmwgfx joydev evdev serio_raw pcspkr ttm sg drm_kms_helper vmw_vmci 
> cec ac button nfsd auth_rpcgss nfs_acl lockd grace drm sunrpc fuse configfs 
> ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq libcrc32c 
> crc32c_generic dm_mod sd_mod t10_pi crc_t10dif crct10dif_generic ata_generic 
> crct10dif_pclmul
>  crct10dif_common crc32c_intel psmouse vmxnet3 ata_piix libata vmw_pvscsi 
> scsi_mod i2c_piix4
> 2021-05-04T02:28:21.051217+02:00 storage-t20 kernel: [1736623.921488] CPU: 0 
> PID: 675 Comm: nfsd Not tainted 5.10.0-0.bpo.4-amd64 #1 Debian 
> 5.10.19-1~bpo10+1
> 2021-05-04T02:28:21.051218+02:00 storage-t20 kernel: [1736623.921488] 
> Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference 
> Platform, BIOS 6.00 12/12/2018
> 2021-05-04T02:28:21.051219+02:00 storage-t20 kernel: [1736623.921491] RIP: 
> 0010:refcount_warn_saturate+0x6d/0xf0
> 2021-05-04T02:28:21.051219+02:00 storage-t20 kernel: [1736623.921492] Code: 
> 05 d8 be 3f 01 01 e8 c3 0a 40 00 0f 0b c3 80 3d c8 be 3f 01 00 75 ce 48 c7 c7 
> 30 6c 92 86 c6 05 b8 be 3f 01 01 e8 a4 0a 40 00 <0f> 0b c3 80 3d ab be 3f 01 
> 00 75 af 48 c7 c7 08 6c 92 86 c6 05 9b
> 2021-05-04T02:28:21.051220+02:00 storage-t20 kernel: [1736623.921493] RSP: 
> 0018:ffffb93f412b3c28 EFLAGS: 00010282
> 2021-05-04T02:28:21.051234+02:00 storage-t20 kernel: [1736623.921494] RAX: 
> 0000000000000000 RBX: ffff9c2c913a0f80 RCX: 0000000000000027
> 2021-05-04T02:28:21.051236+02:00 storage-t20 kernel: [1736623.921495] RDX: 
> 0000000000000027 RSI: ffff9c2d39e18a00 RDI: ffff9c2d39e18a08
> 2021-05-04T02:28:21.051237+02:00 storage-t20 kernel: [1736623.921495] RBP: 
> ffff9c2c96e4f2a4 R08: 0000000000000000 R09: c0000000ffff7fff
> 2021-05-04T02:28:21.051238+02:00 storage-t20 kernel: [1736623.921496] R10: 
> 0000000000000001 R11: ffffb93f412b3a30 R12: ffff9c2c96e4f2a0
> 2021-05-04T02:28:21.051238+02:00 storage-t20 kernel: [1736623.921496] R13: 
> ffff9c2c375f5450 R14: ffff9c2cb4f9fde8 R15: ffffffff86f75300
> 2021-05-04T02:28:21.051244+02:00 storage-t20 kernel: [1736623.921497] FS:  
> 0000000000000000(0000) GS:ffff9c2d39e00000(0000) knlGS:0000000000000000
> 2021-05-04T02:28:21.051245+02:00 storage-t20 kernel: [1736623.921498] CS:  
> 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> 2021-05-04T02:28:21.051245+02:00 storage-t20 kernel: [1736623.921499] CR2: 
> 00007f424807d5b9 CR3: 0000000103084001 CR4: 00000000007706f0
> 2021-05-04T02:28:21.051246+02:00 storage-t20 kernel: [1736623.921522] PKRU: 
> 55555554
> 2021-05-04T02:28:21.051246+02:00 storage-t20 kernel: [1736623.921523] Call 
> Trace:
> 2021-05-04T02:28:21.051246+02:00 storage-t20 kernel: [1736623.921546]  
> nfsd_break_deleg_cb+0xb5/0xc0 [nfsd]
> 2021-05-04T02:28:21.051247+02:00 storage-t20 kernel: [1736623.921553]  
> __break_lease+0x148/0x500
> 2021-05-04T02:28:21.051249+02:00 storage-t20 kernel: [1736623.921564]  ? 
> fill_pre_wcc+0x8f/0x180 [nfsd]
> 2021-05-04T02:28:21.051250+02:00 storage-t20 kernel: [1736623.921566]  
> notify_change+0x196/0x4c0
> 2021-05-04T02:28:21.051250+02:00 storage-t20 kernel: [1736623.921575]  ? 
> nfsd_setattr+0x2e6/0x470 [nfsd]
> 2021-05-04T02:28:21.051250+02:00 storage-t20 kernel: [1736623.921586]  
> nfsd_setattr+0x2e6/0x470 [nfsd]
> 2021-05-04T02:28:21.051251+02:00 storage-t20 kernel: [1736623.921597]  
> nfsd4_setattr+0x7b/0x140 [nfsd]
> 2021-05-04T02:28:21.051251+02:00 storage-t20 kernel: [1736623.921611]  
> nfsd4_proc_compound+0x355/0x680 [nfsd]
> 2021-05-04T02:28:21.051251+02:00 storage-t20 kernel: [1736623.921623]  
> nfsd_dispatch+0xd4/0x180 [nfsd]
> 2021-05-04T02:28:21.051253+02:00 storage-t20 kernel: [1736623.921661]  
> svc_process_common+0x390/0x6c0 [sunrpc]
> 2021-05-04T02:28:21.051253+02:00 storage-t20 kernel: [1736623.921680]  ? 
> svc_recv+0x3c4/0x8a0 [sunrpc]
> 2021-05-04T02:28:21.051254+02:00 storage-t20 kernel: [1736623.921688]  ? 
> nfsd_svc+0x300/0x300 [nfsd]
> 2021-05-04T02:28:21.051254+02:00 storage-t20 kernel: [1736623.921695]  ? 
> nfsd_destroy+0x60/0x60 [nfsd]
> 2021-05-04T02:28:21.051255+02:00 storage-t20 kernel: [1736623.921710]  
> svc_process+0xb7/0xf0 [sunrpc]
> 2021-05-04T02:28:21.051255+02:00 storage-t20 kernel: [1736623.921734]  
> nfsd+0xe8/0x140 [nfsd]
> 2021-05-04T02:28:21.051257+02:00 storage-t20 kernel: [1736623.921737]  
> kthread+0x116/0x130
> 2021-05-04T02:28:21.051258+02:00 storage-t20 kernel: [1736623.921738]  ? 
> kthread_park+0x80/0x80
> 2021-05-04T02:28:21.051258+02:00 storage-t20 kernel: [1736623.921741]  
> ret_from_fork+0x1f/0x30
> 2021-05-04T02:28:21.051259+02:00 storage-t20 kernel: [1736623.921743] ---[ 
> end trace f6e153631af275dc ]---
> 
> It is followed by:
> 2021-05-04T02:28:21.101162+02:00 storage-t20 kernel: [1736623.971161] 
> list_add corruption. prev->next should be next (ffff9c2d0875ecb8), but was 
> ffff9c2c913a0fe8. (prev=ffff9c2c913a0fe8).
> 2021-05-04T02:28:21.101176+02:00 storage-t20 kernel: [1736623.971315] 
> ------------[ cut here ]------------
> 2021-05-04T02:28:21.101177+02:00 storage-t20 kernel: [1736623.971317] kernel 
> BUG at lib/list_debug.c:28!
> 2021-05-04T02:28:21.101178+02:00 storage-t20 kernel: [1736623.971362] invalid 
> opcode: 0000 [#1] SMP NOPTI
> 2021-05-04T02:28:21.101178+02:00 storage-t20 kernel: [1736623.971402] CPU: 1 
> PID: 2435711 Comm: kworker/u256:5 Tainted: G        W         
> 5.10.0-0.bpo.4-amd64 #1 Debian 5.10.19-1~bpo10+1
> 2021-05-04T02:28:21.101179+02:00 storage-t20 kernel: [1736623.971456] 
> Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference 
> Platform, BIOS 6.00 12/12/2018
> 2021-05-04T02:28:21.101180+02:00 storage-t20 kernel: [1736623.971499] 
> Workqueue: nfsd4_callbacks nfsd4_run_cb_work [nfsd]
> 2021-05-04T02:28:21.101180+02:00 storage-t20 kernel: [1736623.971515] RIP: 
> 0010:__list_add_valid.cold.0+0x26/0x28
> 2021-05-04T02:28:21.101181+02:00 storage-t20 kernel: [1736623.971527] Code: 
> 7b 1c bf ff 48 89 d1 48 c7 c7 18 71 92 86 48 89 c2 e8 02 2a ff ff 0f 0b 48 89 
> c1 4c 89 c6 48 c7 c7 70 71 92 86 e8 ee 29 ff ff <0f> 0b 48 89 fe 48 89 c2 48 
> c7 c7 00 72 92 86 e8 da 29 ff ff 0f 0b
> 2021-05-04T02:28:21.101181+02:00 storage-t20 kernel: [1736623.971564] RSP: 
> 0018:ffffb93f4075fe48 EFLAGS: 00010246
> 2021-05-04T02:28:21.101182+02:00 storage-t20 kernel: [1736623.971579] RAX: 
> 0000000000000075 RBX: ffff9c2c913a0fe8 RCX: 0000000000000000
> 2021-05-04T02:28:21.101182+02:00 storage-t20 kernel: [1736623.971594] RDX: 
> 0000000000000000 RSI: ffff9c2d39e58a00 RDI: ffff9c2d39e58a00
> 2021-05-04T02:28:21.101183+02:00 storage-t20 kernel: [1736623.971608] RBP: 
> ffff9c2c913a1018 R08: 0000000000000000 R09: c0000000ffff7fff
> 2021-05-04T02:28:21.101183+02:00 storage-t20 kernel: [1736623.971623] R10: 
> 0000000000000001 R11: ffffb93f4075fc58 R12: ffff9c2d0875ec00
> 2021-05-04T02:28:21.101183+02:00 storage-t20 kernel: [1736623.971637] R13: 
> ffff9c2c913a0fe8 R14: ffff9c2d0875ecb8 R15: ffff9c2c913a1050
> 2021-05-04T02:28:21.101184+02:00 storage-t20 kernel: [1736623.971653] FS:  
> 0000000000000000(0000) GS:ffff9c2d39e40000(0000) knlGS:0000000000000000
> 2021-05-04T02:28:21.101184+02:00 storage-t20 kernel: [1736623.971684] CS:  
> 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> 2021-05-04T02:28:21.101184+02:00 storage-t20 kernel: [1736623.971735] CR2: 
> 00007f8d98002698 CR3: 0000000103904005 CR4: 00000000007706e0
> 2021-05-04T02:28:21.101185+02:00 storage-t20 kernel: [1736623.971774] PKRU: 
> 55555554
> 2021-05-04T02:28:21.101185+02:00 storage-t20 kernel: [1736623.971781] Call 
> Trace:
> 2021-05-04T02:28:21.101186+02:00 storage-t20 kernel: [1736623.971805]  
> nfsd4_cb_recall_prepare+0x2aa/0x2f0 [nfsd]
> 2021-05-04T02:28:21.101186+02:00 storage-t20 kernel: [1736623.971829]  
> nfsd4_run_cb_work+0xe9/0x150 [nfsd]
> 2021-05-04T02:28:21.101186+02:00 storage-t20 kernel: [1736623.971843]  
> process_one_work+0x1aa/0x340
> 2021-05-04T02:28:21.101187+02:00 storage-t20 kernel: [1736623.971855]  ? 
> create_worker+0x1a0/0x1a0
> 2021-05-04T02:28:21.101187+02:00 storage-t20 kernel: [1736623.971865]  
> worker_thread+0x30/0x390
> 2021-05-04T02:28:21.101188+02:00 storage-t20 kernel: [1736623.971875]  ? 
> create_worker+0x1a0/0x1a0
> 2021-05-04T02:28:21.101188+02:00 storage-t20 kernel: [1736623.972279]  
> kthread+0x116/0x130
> 2021-05-04T02:28:21.101188+02:00 storage-t20 kernel: [1736623.972663]  ? 
> kthread_park+0x80/0x80
> 2021-05-04T02:28:21.101189+02:00 storage-t20 kernel: [1736623.973043]  
> ret_from_fork+0x1f/0x30
> 2021-05-04T02:28:21.101189+02:00 storage-t20 kernel: [1736623.973411] Modules 
> linked in: binfmt_misc vsock_loopback vmw_vsock_virtio_transport_common 
> vmw_vsock_vmci_transport vsock intel_rapl_msr intel_rapl_common nfit 
> libnvdimm crc32_pclmul ghash_clmulni_intel aesni_intel libaes crypto_simd 
> cryptd glue_helper rapl vmw_balloon vmwgfx joydev evdev serio_raw pcspkr ttm 
> sg drm_kms_helper vmw_vmci cec ac button nfsd auth_rpcgss nfs_acl lockd grace 
> drm sunrpc fuse configfs ip_tables x_tables autofs4 btrfs blake2b_generic xor 
> raid6_pq libcrc32c crc32c_generic dm_mod sd_mod t10_pi crc_t10dif 
> crct10dif_generic ata_generic crct10dif_pclmul crct10dif_common crc32c_intel 
> psmouse vmxnet3 ata_piix libata vmw_pvscsi scsi_mod i2c_piix4
> 2021-05-04T02:28:21.101190+02:00 storage-t20 kernel: [1736623.976175] ---[ 
> end trace f6e153631af275dd ]---
> 
> We are running a VMware vSphere platform running 9 groups of virtual 
> machines. Each group include a VM with NFS for file sharing and 3 VM with NFS 
> clients, so we are running 9 independent file servers. This issue occured on 
> 2 different file servers with the same kernel version and the same error. 
> There is no direct link between the two servers except the fact they are 
> running the same software, on the same hadware for the same pupose.
> 
> It also occured earlier 4 times on 3 different servers which was running 
> kernel 5.10.13-1~bpo10+1 (package linux-image-5.10.0-0.bpo.3-amd64).

>From the above report I suspect you have not a easy way to trigger the
issue right? Did you see the issue as well with the most current
version in buster-backports, 5.10.24-1~bpo10+1.

Ideally I think this issue should be just be forwarded upstream, but
keep us in the loop accordingly, could you do that?

I did not not found immediately something similarly on
https://lore.kernel.org/linux-nfs/ (a recent one about doing
inter-server copy, but that is/looks different here).

So this would be, mailing

"J. Bruce Fields" <bfie...@fieldses.org> (supporter:KERNEL NFSD, SUNRPC, AND 
LOCKD SERVERS)
Chuck Lever <chuck.le...@oracle.com> (supporter:KERNEL NFSD, SUNRPC, AND LOCKD 
SERVERS)
linux-...@vger.kernel.org (open list:KERNEL NFSD, SUNRPC, AND LOCKD SERVERS)
linux-ker...@vger.kernel.org (open list)

Regards,
Salvatore

Reply via email to