** Changed in: linux (Ubuntu Disco)
Status: In Progress => Fix Committed
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1858832
Title:
invalid opcode xdr_buf_read_netobj on nfs4+krb5i directory
Status in linux package in Ubuntu:
Fix Released
Status in linux source package in Disco:
Fix Committed
Bug description:
== SRU Justification ==
The xdr_shrink_pagelen() added in commit 5f1bc39 (SUNRPC: Fix buffer
handling of GSS MIC without slack), which applied in the Disco tree via
stable update process, sometimes will raise the following kernel trace
when the bytes to remove from buf->pages is larger than buf->page_len:
[ 49.420081] ------------[ cut here ]------------
[ 49.420084] kernel BUG at
/build/linux-hwe-FLYqTt/linux-hwe-5.0.0/net/sunrpc/xdr.c:434!
[ 49.420092] invalid opcode: 0000 [#1] SMP NOPTI
[ 49.420095] CPU: 16 PID: 469 Comm: kworker/u64:13 Tainted: P OE
5.0.0-37-generic #40~18.04.1-Ubuntu
[ 49.420096] Hardware name: System manufacturer System Product Name/ROG
CROSSHAIR VII HERO (WI-FI), BIOS 3004 12/16/2019
[ 49.420109] Workqueue: rpciod rpc_async_schedule [sunrpc]
[ 49.420123] RIP: 0010:xdr_shrink_pagelen+0x9e/0xa0 [sunrpc]
[ 49.420124] Code: 29 ea e8 85 f4 ff ff 44 8b 63 34 8b 43 3c 45 29 ec 44 29
e8 3b 43 40 44 89 63 34 89 43 3c 73 03 89 43 40 5b 41 5c 41 5d 5d c3 <0f> 0b 0f
1f 44 00 00 4c 8d 54 24 08 48 83 e4 f0 b9 04 00 00 00 41
[ 49.420126] RSP: 0018:ffffb93787be7b38 EFLAGS: 00010287
[ 49.420128] RAX: 000000000000000c RBX: 000000000000006c RCX: 000000000000001c
[ 49.420129] RDX: 000000000000005c RSI: 0000000000000010 RDI: ffff8e1a87c56e50
[ 49.420130] RBP: ffffb93787be7b50 R08: ffff8e1b06999700 R09: 0000000000000000
[ 49.420131] R10: 00000000ffffffff R11: ffff8e1b0ecd1cd0 R12: ffff8e1a87c56e50
[ 49.420132] R13: ffffb93787be7c00 R14: 0000000000000058 R15: ffffffffc228e8c0
[ 49.420134] FS: 0000000000000000(0000) GS:ffff8e1b1ea00000(0000)
knlGS:0000000000000000
[ 49.420135] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 49.420136] CR2: 00007ffa1faeb000 CR3: 0000000f19abe000 CR4: 0000000000340ee0
[ 49.420137] Call Trace:
[ 49.420150] xdr_buf_read_netobj+0x122/0x180 [sunrpc]
[ 49.420154] ? kzfree+0x2d/0x40
[ 49.420158] ? crypto_destroy_tfm+0x73/0xb0
[ 49.420162] gss_unwrap_resp_integ.isra.11+0x9c/0x100 [auth_rpcgss]
[ 49.420164] ? gss_unwrap_resp_integ.isra.11+0x9c/0x100 [auth_rpcgss]
[ 49.420167] gss_unwrap_resp+0x13c/0x280 [auth_rpcgss]
[ 49.420170] ? gss_unwrap_resp+0x13c/0x280 [auth_rpcgss]
[ 49.420172] ? gss_validate+0x242/0x300 [auth_rpcgss]
[ 49.420184] ? nfs4_xdr_dec_readdir+0x100/0x100 [nfsv4]
[ 49.420194] rpcauth_unwrap_resp+0x67/0xe0 [sunrpc]
[ 49.420204] ? nfs4_xdr_dec_readdir+0x100/0x100 [nfsv4]
[ 49.420213] call_decode+0x1c4/0x880 [sunrpc]
[ 49.420216] ? __switch_to_asm+0x35/0x70
[ 49.420224] ? rpc_check_timeout+0x130/0x130 [sunrpc]
[ 49.420233] __rpc_execute+0x7a/0x3f0 [sunrpc]
[ 49.420242] rpc_async_schedule+0x12/0x20 [sunrpc]
[ 49.420245] process_one_work+0x1fd/0x400
[ 49.420247] worker_thread+0x34/0x410
[ 49.420249] kthread+0x121/0x140
[ 49.420250] ? process_one_work+0x400/0x400
[ 49.420252] ? kthread_park+0xb0/0xb0
[ 49.420254] ret_from_fork+0x22/0x40
== Fixes ==
* e8d70b32 (SUNRPC: Fix another issue with MIC buffer space)
Instead of calling BUG_ON, this patch will just cap the number of bytes
that xdr_shrink_pagelen() will move.
Only Disco kernel needs this patch, for Bionic and earlier they don't
have 5f1bc39, and this fix has been applied to Eoan and onward.
== Test ==
Test kernel can be found here:
https://people.canonical.com/~phlin/kernel/lp-1858832-sunrpc-bufferhandling/
And it's been stress-tested by the bug reporter, Michael, this issue
can no longer be reproduced.
== Regression Potential ==
Low. It's just changing the length of bytes to shrink, change limited
to a single driver with positive test result.
== Original Bug Report ==
RELEASE=19.3
CODENAME=tricia
EDITION="Cinnamon"
DESCRIPTION="Linux Mint 19.3 Tricia"
DESKTOP=Gnome
TOOLKIT=GTK
NEW_FEATURES_URL=https://www.linuxmint.com/rel_tricia_cinnamon_whatsnew.php
RELEASE_NOTES_URL=https://www.linuxmint.com/rel_tricia_cinnamon.php
USER_GUIDE_URL=https://www.linuxmint.com/documentation.php
GRUB_TITLE=Linux Mint 19.3 Cinnamon
My home dir is mounted through nfs on a local server via nfs4 and krb5i.
When stressing the mounted directory or its sub-directories (sometimes
starting firefox, sometimes starting thunderbird, nearly guaranteed when
compiling, sometimes the login itself), it will eventually lead to the
following stack-trace. The corresponding process is then stuck and
accessing the mounted directory (like calling ls) easily yields further and
similar stack trace and causing the process to also stuck.
Currently I am running an AMD 3950x on a ASUS Crosshair VII Hero Wifi
(chipset x470), but I had the same issues with an Intel 6700K on a
ASUS Crosshair VIII Hero in fall of 2019. I couldn't be bother back
then to report the bug so I just kept running a working kernel
(~5.0.0-15 I think) without updating it. After Christmas I updated
said Intel machine with the AMD machine, re-installed Linux Mint,
installed all updates and therefore ran into this issue again.
[ 49.420081] ------------[ cut here ]------------
[ 49.420084] kernel BUG at
/build/linux-hwe-FLYqTt/linux-hwe-5.0.0/net/sunrpc/xdr.c:434!
[ 49.420092] invalid opcode: 0000 [#1] SMP NOPTI
[ 49.420095] CPU: 16 PID: 469 Comm: kworker/u64:13 Tainted: P OE
5.0.0-37-generic #40~18.04.1-Ubuntu
[ 49.420096] Hardware name: System manufacturer System Product Name/ROG
CROSSHAIR VII HERO (WI-FI), BIOS 3004 12/16/2019
[ 49.420109] Workqueue: rpciod rpc_async_schedule [sunrpc]
[ 49.420123] RIP: 0010:xdr_shrink_pagelen+0x9e/0xa0 [sunrpc]
[ 49.420124] Code: 29 ea e8 85 f4 ff ff 44 8b 63 34 8b 43 3c 45 29 ec 44 29
e8 3b 43 40 44 89 63 34 89 43 3c 73 03 89 43 40 5b 41 5c 41 5d 5d c3 <0f> 0b 0f
1f 44 00 00 4c 8d 54 24 08 48 83 e4 f0 b9 04 00 00 00 41
[ 49.420126] RSP: 0018:ffffb93787be7b38 EFLAGS: 00010287
[ 49.420128] RAX: 000000000000000c RBX: 000000000000006c RCX:
000000000000001c
[ 49.420129] RDX: 000000000000005c RSI: 0000000000000010 RDI:
ffff8e1a87c56e50
[ 49.420130] RBP: ffffb93787be7b50 R08: ffff8e1b06999700 R09:
0000000000000000
[ 49.420131] R10: 00000000ffffffff R11: ffff8e1b0ecd1cd0 R12:
ffff8e1a87c56e50
[ 49.420132] R13: ffffb93787be7c00 R14: 0000000000000058 R15:
ffffffffc228e8c0
[ 49.420134] FS: 0000000000000000(0000) GS:ffff8e1b1ea00000(0000)
knlGS:0000000000000000
[ 49.420135] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 49.420136] CR2: 00007ffa1faeb000 CR3: 0000000f19abe000 CR4:
0000000000340ee0
[ 49.420137] Call Trace:
[ 49.420150] xdr_buf_read_netobj+0x122/0x180 [sunrpc]
[ 49.420154] ? kzfree+0x2d/0x40
[ 49.420158] ? crypto_destroy_tfm+0x73/0xb0
[ 49.420162] gss_unwrap_resp_integ.isra.11+0x9c/0x100 [auth_rpcgss]
[ 49.420164] ? gss_unwrap_resp_integ.isra.11+0x9c/0x100 [auth_rpcgss]
[ 49.420167] gss_unwrap_resp+0x13c/0x280 [auth_rpcgss]
[ 49.420170] ? gss_unwrap_resp+0x13c/0x280 [auth_rpcgss]
[ 49.420172] ? gss_validate+0x242/0x300 [auth_rpcgss]
[ 49.420184] ? nfs4_xdr_dec_readdir+0x100/0x100 [nfsv4]
[ 49.420194] rpcauth_unwrap_resp+0x67/0xe0 [sunrpc]
[ 49.420204] ? nfs4_xdr_dec_readdir+0x100/0x100 [nfsv4]
[ 49.420213] call_decode+0x1c4/0x880 [sunrpc]
[ 49.420216] ? __switch_to_asm+0x35/0x70
[ 49.420224] ? rpc_check_timeout+0x130/0x130 [sunrpc]
[ 49.420233] __rpc_execute+0x7a/0x3f0 [sunrpc]
[ 49.420242] rpc_async_schedule+0x12/0x20 [sunrpc]
[ 49.420245] process_one_work+0x1fd/0x400
[ 49.420247] worker_thread+0x34/0x410
[ 49.420249] kthread+0x121/0x140
[ 49.420250] ? process_one_work+0x400/0x400
[ 49.420252] ? kthread_park+0xb0/0xb0
[ 49.420254] ret_from_fork+0x22/0x40
[ 49.420255] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd
grace fscache edac_mce_amd snd_hda_codec_hdmi joydev kvm hid_roccat_koneplus
hid_roccat irqbypass hid_roccat_common nvidia_uvm(OE) nvidia_drm(POE)
nvidia_modeset(POE) snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio
snd_hda_codec_ca0132 snd_hda_intel snd_usb_audio snd_hda_codec snd_usbmidi_lib
snd_hda_core crct10dif_pclmul snd_hwdep crc32_pclmul snd_seq_midi snd_pcm
nvidia(POE) ghash_clmulni_intel snd_seq_midi_event eeepc_wmi aesni_intel
snd_rawmidi asus_wmi sparse_keymap aes_x86_64 crypto_simd cryptd video
glue_helper snd_seq drm_kms_helper snd_seq_device mxm_wmi wmi_bmof input_leds
drm snd_timer ipmi_devintf snd serio_raw ccp ipmi_msghandler fb_sys_fops
syscopyarea sysfillrect sysimgblt soundcore k10temp mac_hid sch_fq_codel
asus_wmi_sensors(OE) parport_pc sunrpc ppdev lp parport ip_tables x_tables
autofs4 btrfs xor zstd_compress raid6_pq libcrc32c dm_mirror dm_region_hash
dm_log hid_plantronics
[ 49.420282] hid_generic usbhid hid igb i2c_piix4 nvme dca ahci
i2c_algo_bit nvme_core libahci gpio_amdpt wmi gpio_generic
[ 49.420293] ---[ end trace 75bda976d7f1c02d ]---
[ 49.420305] RIP: 0010:xdr_shrink_pagelen+0x9e/0xa0 [sunrpc]
[ 49.420306] Code: 29 ea e8 85 f4 ff ff 44 8b 63 34 8b 43 3c 45 29 ec 44 29
e8 3b 43 40 44 89 63 34 89 43 3c 73 03 89 43 40 5b 41 5c 41 5d 5d c3 <0f> 0b 0f
1f 44 00 00 4c 8d 54 24 08 48 83 e4 f0 b9 04 00 00 00 41
[ 49.420307] RSP: 0018:ffffb93787be7b38 EFLAGS: 00010287
[ 49.420309] RAX: 000000000000000c RBX: 000000000000006c RCX:
000000000000001c
[ 49.420310] RDX: 000000000000005c RSI: 0000000000000010 RDI:
ffff8e1a87c56e50
[ 49.420311] RBP: ffffb93787be7b50 R08: ffff8e1b06999700 R09:
0000000000000000
[ 49.420312] R10: 00000000ffffffff R11: ffff8e1b0ecd1cd0 R12:
ffff8e1a87c56e50
[ 49.420312] R13: ffffb93787be7c00 R14: 0000000000000058 R15:
ffffffffc228e8c0
[ 49.420314] FS: 0000000000000000(0000) GS:ffff8e1b1ea00000(0000)
knlGS:0000000000000000
[ 49.420315] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 49.420316] CR2: 00007ffa1faeb000 CR3: 0000000f19abe000 CR4:
0000000000340ee0
.
[Jan 1 03:45] ------------[ cut here ]------------
[ +0,000002] kernel BUG at
/build/linux-hwe-W9CF8Q/linux-hwe-5.0.0/net/sunrpc/xdr.c:434!
[ +0,000006] invalid opcode: 0000 [#1] SMP NOPTI
[ +0,000002] CPU: 4 PID: 28219 Comm: kworker/u64:2 Tainted: P OE
5.0.0-35-generic #38~18.04.1-Ubuntu
[ +0,000001] Hardware name: System manufacturer System Product Name/ROG
CROSSHAIR VII HERO (WI-FI), BIOS 3004 12/16/2019
[ +0,000011] Workqueue: rpciod rpc_async_schedule [sunrpc]
[ +0,000010] RIP: 0010:xdr_shrink_pagelen+0x9e/0xa0 [sunrpc]
[ +0,000001] Code: 29 ea e8 85 f4 ff ff 44 8b 63 34 8b 43 3c 45 29 ec 44 29
e8 3b 43 40 44 89 63 34 89 43 3c 73 03 89 43 40 5b 41 5c 41 5d 5d c3 <0f> 0b 0f
1f 44 00 00 4c 8d 54 24 08 48 83 e4 f0 b9 04 00 00 00 41
[ +0,000001] RSP: 0018:ffffa2dd18117b28 EFLAGS: 00010297
[ +0,000001] RAX: 0000000000000010 RBX: 0000000000000070 RCX:
000000000000001c
[ +0,000001] RDX: 000000000000005c RSI: 0000000000000014 RDI:
ffff8b96c0856650
[ +0,000001] RBP: ffffa2dd18117b40 R08: ffff8b97d1f82e00 R09:
0000000000000000
[ +0,000000] R10: 1d1cc51b00000000 R11: ffff8b97cf00e520 R12:
ffff8b96c0856650
[ +0,000001] R13: ffffa2dd18117bf0 R14: 0000000000000058 R15:
ffffffffc0eb8920
[ +0,000001] FS: 0000000000000000(0000) GS:ffff8b97de700000(0000)
knlGS:0000000000000000
[ +0,000001] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ +0,000001] CR2: 0000191e985bac88 CR3: 0000000fd656c000 CR4:
0000000000340ee0
[ +0,000001] Call Trace:
[ +0,000009] xdr_buf_read_netobj+0x122/0x180 [sunrpc]
[ +0,000003] ? kzfree+0x2d/0x40
[ +0,000002] ? crypto_destroy_tfm+0x73/0xb0
[ +0,000003] gss_unwrap_resp_integ.isra.11+0x9c/0x100 [auth_rpcgss]
[ +0,000002] ? gss_unwrap_resp_integ.isra.11+0x9c/0x100 [auth_rpcgss]
[ +0,000002] gss_unwrap_resp+0x13c/0x280 [auth_rpcgss]
[ +0,000002] ? kmem_cache_alloc_trace+0x42/0x1c0
[ +0,000002] ? gss_unwrap_resp+0x13c/0x280 [auth_rpcgss]
[ +0,000002] ? gss_validate+0x242/0x300 [auth_rpcgss]
[ +0,000008] ? nfs4_xdr_dec_readdir+0x100/0x100 [nfsv4]
[ +0,000008] rpcauth_unwrap_resp+0x67/0xe0 [sunrpc]
[ +0,000007] ? nfs4_xdr_dec_readdir+0x100/0x100 [nfsv4]
[ +0,000007] call_decode+0x166/0x8b0 [sunrpc]
[ +0,000002] ? __switch_to_asm+0x41/0x70
[ +0,000006] ? call_refreshresult+0x130/0x130 [sunrpc]
[ +0,000006] __rpc_execute+0x7a/0x3f0 [sunrpc]
[ +0,000007] rpc_async_schedule+0x12/0x20 [sunrpc]
[ +0,000002] process_one_work+0x1fd/0x400
[ +0,000002] worker_thread+0x34/0x410
[ +0,000001] kthread+0x121/0x140
[ +0,000001] ? process_one_work+0x400/0x400
[ +0,000002] ? kthread_park+0xb0/0xb0
[ +0,000001] ret_from_fork+0x22/0x40
[ +0,000001] Modules linked in: nls_utf8 udf crc_itu_t rpcsec_gss_krb5
auth_rpcgss nfsv4 nfs lockd grace fscache edac_mce_amd snd_hda_codec_hdmi kvm
irqbypass joydev crct10dif_pclmul nvidia_uvm(OE) crc32_pclmul
hid_roccat_koneplus nvidia_drm(POE) hid_roccat ghash_clmulni_intel
hid_roccat_common nvidia_modeset(POE) nvidia(POE) snd_usb_audio
snd_hda_codec_realtek
snd_usbmidi_lib snd_hda_codec_generic ledtrig_audio snd_hda_codec_ca0132
aesni_intel input_leds snd_hda_intel eeepc_wmi snd_hda_codec asus_wmi
aes_x86_64 drm_kms_helper crypto_simd snd_hda_core snd_seq_midi cryptd
sparse_keymap snd_hwdep snd_seq_midi_event video glue_helper wmi_bmof mxm_wmi
serio_raw drm snd_rawmidi snd_pcm ipmi_devintf ipmi_msghandler snd_seq
fb_sys_fops syscopyarea sysfillrect snd_seq_device sysimgblt snd_timer
k10temp ccp snd soundcore mac_hid sch_fq_codel asus_wmi_sensors(OE) parport_pc
ppdev sunrpc lp parport ip_tables x_tables autofs4 btrfs xor zstd_compress
raid6_pq libcrc32c dm_mirror dm_region_hash dm_log
[ +0,000019] hid_plantronics hid_generic usbhid hid igb i2c_piix4 dca
i2c_algo_bit ahci nvme libahci nvme_core wmi gpio_amdpt gpio_generic
[ +0,000008] ---[ end trace 4314523bc923f697 ]---
[ +0,000007] RIP: 0010:xdr_shrink_pagelen+0x9e/0xa0 [sunrpc]
[ +0,000001] Code: 29 ea e8 85 f4 ff ff 44 8b 63 34 8b 43 3c 45 29 ec 44 29
e8 3b 43 40 44 89 63 34 89 43 3c 73 03 89 43 40 5b 41 5c 41 5d 5d c3 <0f> 0b 0f
1f 44 00 00 4c 8d 54 24 08 48 83 e4 f0 b9 04 00 00 00 41
[ +0,000001] RSP: 0018:ffffa2dd18117b28 EFLAGS: 00010297
[ +0,000001] RAX: 0000000000000010 RBX: 0000000000000070 RCX:
000000000000001c
[ +0,000001] RDX: 000000000000005c RSI: 0000000000000014 RDI:
ffff8b96c0856650
[ +0,000000] RBP: ffffa2dd18117b40 R08: ffff8b97d1f82e00 R09:
0000000000000000
[ +0,000001] R10: 1d1cc51b00000000 R11: ffff8b97cf00e520 R12:
ffff8b96c0856650
[ +0,000001] R13: ffffa2dd18117bf0 R14: 0000000000000058 R15:
ffffffffc0eb8920
[ +0,000001] FS: 0000000000000000(0000) GS:ffff8b97de700000(0000)
knlGS:0000000000000000
[ +0,000001] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ +0,000000] CR2: 0000191e985bac88 CR3: 0000000fd656c000 CR4:
0000000000340ee0
.
With a little compile-stress-test, I have tested the following kernels which
seem to run fine:
* 4.15.0-69
* 4.15.0-70
* 4.15.0-72
* 5.0.0-32 (current daily driver, runs without a hassle, max test length 2d
4h 33m - I am writing this bug report on it)
But the following kernels do not run stable:
* 5.0.0-35 (second stack-trace from above)
* 5.0.0-37 (fist stack-trace from above, as you can see 49s after boot will
already throw the error)
* 5.3.0-24
$ lspci | grep -i ether
06:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network
Connection (rev 03
$ mount | grep filer
filer:/ on /share type nfs4
(rw,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,timeo=600,retrans=2,sec=krb5i,clientaddr=192.168.3.55,local_lock=none,addr=192.168.2.33)
filer:/home/michael on /share/home/michael type nfs4
(rw,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,timeo=600,retrans=2,sec=krb5i,clientaddr=192.168.3.55,local_lock=none,addr=192.168.2.33)
$ cat /etc/fstab | grep -i filer
filer:/ /share/ nfs4
nfsvers=4,sec=krb5i,rw,x-systemd.automount,soft,intr,tcp,noatime 0 0
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1858832/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp