Thanks for testing!

I will SRU this to our Disco kernel.
https://lists.ubuntu.com/archives/kernel-team/2020-January/106822.html

** Package changed: linux-hwe (Ubuntu) => linux (Ubuntu)

** Also affects: linux (Ubuntu Disco)
   Importance: Undecided
       Status: New

** Changed in: linux (Ubuntu)
       Status: Incomplete => Fix Released

** Changed in: linux (Ubuntu Disco)
       Status: New => In Progress

** Changed in: linux (Ubuntu Disco)
     Assignee: (unassigned) => Po-Hsu Lin (cypressyew)

** Tags added: disco

** Description changed:

+ == SRU Justification ==
+ The xdr_shrink_pagelen() added in commit 5f1bc39 (SUNRPC: Fix buffer
+ handling of GSS MIC without slack), which applied in the Disco tree via
+ stable update process, sometimes will raise the following kernel trace
+ when the bytes to remove from buf->pages is larger than buf->page_len:
+ 
+ [ 49.420081] ------------[ cut here ]------------
+ [ 49.420084] kernel BUG at 
/build/linux-hwe-FLYqTt/linux-hwe-5.0.0/net/sunrpc/xdr.c:434!
+ [ 49.420092] invalid opcode: 0000 [#1] SMP NOPTI
+ [ 49.420095] CPU: 16 PID: 469 Comm: kworker/u64:13 Tainted: P OE 
5.0.0-37-generic #40~18.04.1-Ubuntu
+ [ 49.420096] Hardware name: System manufacturer System Product Name/ROG 
CROSSHAIR VII HERO (WI-FI), BIOS 3004 12/16/2019
+ [ 49.420109] Workqueue: rpciod rpc_async_schedule [sunrpc]
+ [ 49.420123] RIP: 0010:xdr_shrink_pagelen+0x9e/0xa0 [sunrpc]
+ [ 49.420124] Code: 29 ea e8 85 f4 ff ff 44 8b 63 34 8b 43 3c 45 29 ec 44 29 
e8 3b 43 40 44 89 63 34 89 43 3c 73 03 89 43 40 5b 41 5c 41 5d 5d c3 <0f> 0b 0f 
1f 44 00 00 4c 8d 54 24 08 48 83 e4 f0 b9 04 00 00 00 41
+ [ 49.420126] RSP: 0018:ffffb93787be7b38 EFLAGS: 00010287
+ [ 49.420128] RAX: 000000000000000c RBX: 000000000000006c RCX: 000000000000001c
+ [ 49.420129] RDX: 000000000000005c RSI: 0000000000000010 RDI: ffff8e1a87c56e50
+ [ 49.420130] RBP: ffffb93787be7b50 R08: ffff8e1b06999700 R09: 0000000000000000
+ [ 49.420131] R10: 00000000ffffffff R11: ffff8e1b0ecd1cd0 R12: ffff8e1a87c56e50
+ [ 49.420132] R13: ffffb93787be7c00 R14: 0000000000000058 R15: ffffffffc228e8c0
+ [ 49.420134] FS: 0000000000000000(0000) GS:ffff8e1b1ea00000(0000) 
knlGS:0000000000000000
+ [ 49.420135] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
+ [ 49.420136] CR2: 00007ffa1faeb000 CR3: 0000000f19abe000 CR4: 0000000000340ee0
+ [ 49.420137] Call Trace:
+ [ 49.420150] xdr_buf_read_netobj+0x122/0x180 [sunrpc]
+ [ 49.420154] ? kzfree+0x2d/0x40
+ [ 49.420158] ? crypto_destroy_tfm+0x73/0xb0
+ [ 49.420162] gss_unwrap_resp_integ.isra.11+0x9c/0x100 [auth_rpcgss]
+ [ 49.420164] ? gss_unwrap_resp_integ.isra.11+0x9c/0x100 [auth_rpcgss]
+ [ 49.420167] gss_unwrap_resp+0x13c/0x280 [auth_rpcgss]
+ [ 49.420170] ? gss_unwrap_resp+0x13c/0x280 [auth_rpcgss]
+ [ 49.420172] ? gss_validate+0x242/0x300 [auth_rpcgss]
+ [ 49.420184] ? nfs4_xdr_dec_readdir+0x100/0x100 [nfsv4]
+ [ 49.420194] rpcauth_unwrap_resp+0x67/0xe0 [sunrpc]
+ [ 49.420204] ? nfs4_xdr_dec_readdir+0x100/0x100 [nfsv4]
+ [ 49.420213] call_decode+0x1c4/0x880 [sunrpc]
+ [ 49.420216] ? __switch_to_asm+0x35/0x70
+ [ 49.420224] ? rpc_check_timeout+0x130/0x130 [sunrpc]
+ [ 49.420233] __rpc_execute+0x7a/0x3f0 [sunrpc]
+ [ 49.420242] rpc_async_schedule+0x12/0x20 [sunrpc]
+ [ 49.420245] process_one_work+0x1fd/0x400
+ [ 49.420247] worker_thread+0x34/0x410
+ [ 49.420249] kthread+0x121/0x140
+ [ 49.420250] ? process_one_work+0x400/0x400
+ [ 49.420252] ? kthread_park+0xb0/0xb0
+ [ 49.420254] ret_from_fork+0x22/0x40
+ 
+ == Fixes ==
+ * e8d70b32 (SUNRPC: Fix another issue with MIC buffer space)
+ Instead of calling BUG_ON, this patch will just cap the number of bytes
+ that xdr_shrink_pagelen() will move.
+ 
+ Only Disco kernel needs this patch, for Bionic and earlier they don't
+ have 5f1bc39, and this fix has been applied to Eoan and onward.
+ 
+ == Test ==
+ Test kernel can be found here:
+ https://people.canonical.com/~phlin/kernel/lp-1858832-sunrpc-bufferhandling/
+ 
+ And it's been stress-tested by the bug reporter, Michael, this issue
+ can no longer be reproduced.
+ 
+ == Regression Potential ==
+ Low. It's just changing the length of bytes to shrink, change limited
+ to a single driver with positive test result.
+ 
+ 
+ == Original Bug Report ==
  RELEASE=19.3
  CODENAME=tricia
  EDITION="Cinnamon"
  DESCRIPTION="Linux Mint 19.3 Tricia"
  DESKTOP=Gnome
  TOOLKIT=GTK
  NEW_FEATURES_URL=https://www.linuxmint.com/rel_tricia_cinnamon_whatsnew.php
  RELEASE_NOTES_URL=https://www.linuxmint.com/rel_tricia_cinnamon.php
  USER_GUIDE_URL=https://www.linuxmint.com/documentation.php
  GRUB_TITLE=Linux Mint 19.3 Cinnamon
  
  My home dir is mounted through nfs on a local server via nfs4 and krb5i.
  When stressing the mounted directory or its sub-directories (sometimes 
starting firefox, sometimes starting thunderbird, nearly guaranteed when 
compiling, sometimes the login itself), it will eventually lead to the 
following stack-trace. The corresponding process is then stuck and
  accessing the mounted directory (like calling ls) easily yields further and 
similar stack trace and causing the process to also stuck.
  
  Currently I am running an AMD 3950x on a ASUS Crosshair VII Hero Wifi
  (chipset x470), but I had the same issues with an Intel 6700K on a ASUS
  Crosshair VIII Hero in fall of 2019. I couldn't be bother back then to
  report the bug so I just kept running a working kernel (~5.0.0-15 I
  think) without updating it. After Christmas I updated said Intel machine
  with the AMD machine, re-installed Linux Mint, installed all updates and
  therefore ran into this issue again.
  
  [   49.420081] ------------[ cut here ]------------
  [   49.420084] kernel BUG at 
/build/linux-hwe-FLYqTt/linux-hwe-5.0.0/net/sunrpc/xdr.c:434!
  [   49.420092] invalid opcode: 0000 [#1] SMP NOPTI
  [   49.420095] CPU: 16 PID: 469 Comm: kworker/u64:13 Tainted: P           OE  
   5.0.0-37-generic #40~18.04.1-Ubuntu
  [   49.420096] Hardware name: System manufacturer System Product Name/ROG 
CROSSHAIR VII HERO (WI-FI), BIOS 3004 12/16/2019
  [   49.420109] Workqueue: rpciod rpc_async_schedule [sunrpc]
  [   49.420123] RIP: 0010:xdr_shrink_pagelen+0x9e/0xa0 [sunrpc]
  [   49.420124] Code: 29 ea e8 85 f4 ff ff 44 8b 63 34 8b 43 3c 45 29 ec 44 29 
e8 3b 43 40 44 89 63 34 89 43 3c 73 03 89 43 40 5b 41 5c 41 5d 5d c3 <0f> 0b 0f 
1f 44 00 00 4c 8d 54 24 08 48 83 e4 f0 b9 04 00 00 00 41
  [   49.420126] RSP: 0018:ffffb93787be7b38 EFLAGS: 00010287
  [   49.420128] RAX: 000000000000000c RBX: 000000000000006c RCX: 
000000000000001c
  [   49.420129] RDX: 000000000000005c RSI: 0000000000000010 RDI: 
ffff8e1a87c56e50
  [   49.420130] RBP: ffffb93787be7b50 R08: ffff8e1b06999700 R09: 
0000000000000000
  [   49.420131] R10: 00000000ffffffff R11: ffff8e1b0ecd1cd0 R12: 
ffff8e1a87c56e50
  [   49.420132] R13: ffffb93787be7c00 R14: 0000000000000058 R15: 
ffffffffc228e8c0
  [   49.420134] FS:  0000000000000000(0000) GS:ffff8e1b1ea00000(0000) 
knlGS:0000000000000000
  [   49.420135] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [   49.420136] CR2: 00007ffa1faeb000 CR3: 0000000f19abe000 CR4: 
0000000000340ee0
  [   49.420137] Call Trace:
  [   49.420150]  xdr_buf_read_netobj+0x122/0x180 [sunrpc]
  [   49.420154]  ? kzfree+0x2d/0x40
  [   49.420158]  ? crypto_destroy_tfm+0x73/0xb0
  [   49.420162]  gss_unwrap_resp_integ.isra.11+0x9c/0x100 [auth_rpcgss]
  [   49.420164]  ? gss_unwrap_resp_integ.isra.11+0x9c/0x100 [auth_rpcgss]
  [   49.420167]  gss_unwrap_resp+0x13c/0x280 [auth_rpcgss]
  [   49.420170]  ? gss_unwrap_resp+0x13c/0x280 [auth_rpcgss]
  [   49.420172]  ? gss_validate+0x242/0x300 [auth_rpcgss]
  [   49.420184]  ? nfs4_xdr_dec_readdir+0x100/0x100 [nfsv4]
  [   49.420194]  rpcauth_unwrap_resp+0x67/0xe0 [sunrpc]
  [   49.420204]  ? nfs4_xdr_dec_readdir+0x100/0x100 [nfsv4]
  [   49.420213]  call_decode+0x1c4/0x880 [sunrpc]
  [   49.420216]  ? __switch_to_asm+0x35/0x70
  [   49.420224]  ? rpc_check_timeout+0x130/0x130 [sunrpc]
  [   49.420233]  __rpc_execute+0x7a/0x3f0 [sunrpc]
  [   49.420242]  rpc_async_schedule+0x12/0x20 [sunrpc]
  [   49.420245]  process_one_work+0x1fd/0x400
  [   49.420247]  worker_thread+0x34/0x410
  [   49.420249]  kthread+0x121/0x140
  [   49.420250]  ? process_one_work+0x400/0x400
  [   49.420252]  ? kthread_park+0xb0/0xb0
  [   49.420254]  ret_from_fork+0x22/0x40
  [   49.420255] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd 
grace fscache edac_mce_amd snd_hda_codec_hdmi joydev kvm hid_roccat_koneplus 
hid_roccat irqbypass hid_roccat_common nvidia_uvm(OE) nvidia_drm(POE) 
nvidia_modeset(POE) snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio 
snd_hda_codec_ca0132 snd_hda_intel snd_usb_audio snd_hda_codec snd_usbmidi_lib 
snd_hda_core crct10dif_pclmul snd_hwdep crc32_pclmul snd_seq_midi snd_pcm 
nvidia(POE) ghash_clmulni_intel snd_seq_midi_event eeepc_wmi aesni_intel 
snd_rawmidi asus_wmi sparse_keymap aes_x86_64 crypto_simd cryptd video 
glue_helper snd_seq drm_kms_helper snd_seq_device mxm_wmi wmi_bmof input_leds 
drm snd_timer ipmi_devintf snd serio_raw ccp ipmi_msghandler fb_sys_fops 
syscopyarea sysfillrect sysimgblt soundcore k10temp mac_hid sch_fq_codel 
asus_wmi_sensors(OE) parport_pc sunrpc ppdev lp parport ip_tables x_tables 
autofs4 btrfs xor zstd_compress raid6_pq libcrc32c dm_mirror dm_region_hash 
dm_log hid_plantronics
  [   49.420282]  hid_generic usbhid hid igb i2c_piix4 nvme dca ahci 
i2c_algo_bit nvme_core libahci gpio_amdpt wmi gpio_generic
  [   49.420293] ---[ end trace 75bda976d7f1c02d ]---
  [   49.420305] RIP: 0010:xdr_shrink_pagelen+0x9e/0xa0 [sunrpc]
  [   49.420306] Code: 29 ea e8 85 f4 ff ff 44 8b 63 34 8b 43 3c 45 29 ec 44 29 
e8 3b 43 40 44 89 63 34 89 43 3c 73 03 89 43 40 5b 41 5c 41 5d 5d c3 <0f> 0b 0f 
1f 44 00 00 4c 8d 54 24 08 48 83 e4 f0 b9 04 00 00 00 41
  [   49.420307] RSP: 0018:ffffb93787be7b38 EFLAGS: 00010287
  [   49.420309] RAX: 000000000000000c RBX: 000000000000006c RCX: 
000000000000001c
  [   49.420310] RDX: 000000000000005c RSI: 0000000000000010 RDI: 
ffff8e1a87c56e50
  [   49.420311] RBP: ffffb93787be7b50 R08: ffff8e1b06999700 R09: 
0000000000000000
  [   49.420312] R10: 00000000ffffffff R11: ffff8e1b0ecd1cd0 R12: 
ffff8e1a87c56e50
  [   49.420312] R13: ffffb93787be7c00 R14: 0000000000000058 R15: 
ffffffffc228e8c0
  [   49.420314] FS:  0000000000000000(0000) GS:ffff8e1b1ea00000(0000) 
knlGS:0000000000000000
  [   49.420315] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [   49.420316] CR2: 00007ffa1faeb000 CR3: 0000000f19abe000 CR4: 
0000000000340ee0
  
  .
  
  [Jan 1 03:45] ------------[ cut here ]------------
  [  +0,000002] kernel BUG at 
/build/linux-hwe-W9CF8Q/linux-hwe-5.0.0/net/sunrpc/xdr.c:434!
  [  +0,000006] invalid opcode: 0000 [#1] SMP NOPTI
  [  +0,000002] CPU: 4 PID: 28219 Comm: kworker/u64:2 Tainted: P           OE   
  5.0.0-35-generic #38~18.04.1-Ubuntu
  [  +0,000001] Hardware name: System manufacturer System Product Name/ROG 
CROSSHAIR VII HERO (WI-FI), BIOS 3004 12/16/2019
  [  +0,000011] Workqueue: rpciod rpc_async_schedule [sunrpc]
  [  +0,000010] RIP: 0010:xdr_shrink_pagelen+0x9e/0xa0 [sunrpc]
  [  +0,000001] Code: 29 ea e8 85 f4 ff ff 44 8b 63 34 8b 43 3c 45 29 ec 44 29 
e8 3b 43 40 44 89 63 34 89 43 3c 73 03 89 43 40 5b 41 5c 41 5d 5d c3 <0f> 0b 0f 
1f 44 00 00 4c 8d 54 24 08 48 83 e4 f0 b9 04 00 00 00 41
  [  +0,000001] RSP: 0018:ffffa2dd18117b28 EFLAGS: 00010297
  [  +0,000001] RAX: 0000000000000010 RBX: 0000000000000070 RCX: 
000000000000001c
  [  +0,000001] RDX: 000000000000005c RSI: 0000000000000014 RDI: 
ffff8b96c0856650
  [  +0,000001] RBP: ffffa2dd18117b40 R08: ffff8b97d1f82e00 R09: 
0000000000000000
  [  +0,000000] R10: 1d1cc51b00000000 R11: ffff8b97cf00e520 R12: 
ffff8b96c0856650
  [  +0,000001] R13: ffffa2dd18117bf0 R14: 0000000000000058 R15: 
ffffffffc0eb8920
  [  +0,000001] FS:  0000000000000000(0000) GS:ffff8b97de700000(0000) 
knlGS:0000000000000000
  [  +0,000001] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [  +0,000001] CR2: 0000191e985bac88 CR3: 0000000fd656c000 CR4: 
0000000000340ee0
  [  +0,000001] Call Trace:
  [  +0,000009]  xdr_buf_read_netobj+0x122/0x180 [sunrpc]
  [  +0,000003]  ? kzfree+0x2d/0x40
  [  +0,000002]  ? crypto_destroy_tfm+0x73/0xb0
  [  +0,000003]  gss_unwrap_resp_integ.isra.11+0x9c/0x100 [auth_rpcgss]
  [  +0,000002]  ? gss_unwrap_resp_integ.isra.11+0x9c/0x100 [auth_rpcgss]
  [  +0,000002]  gss_unwrap_resp+0x13c/0x280 [auth_rpcgss]
  [  +0,000002]  ? kmem_cache_alloc_trace+0x42/0x1c0
  [  +0,000002]  ? gss_unwrap_resp+0x13c/0x280 [auth_rpcgss]
  [  +0,000002]  ? gss_validate+0x242/0x300 [auth_rpcgss]
  [  +0,000008]  ? nfs4_xdr_dec_readdir+0x100/0x100 [nfsv4]
  [  +0,000008]  rpcauth_unwrap_resp+0x67/0xe0 [sunrpc]
  [  +0,000007]  ? nfs4_xdr_dec_readdir+0x100/0x100 [nfsv4]
  [  +0,000007]  call_decode+0x166/0x8b0 [sunrpc]
  [  +0,000002]  ? __switch_to_asm+0x41/0x70
  [  +0,000006]  ? call_refreshresult+0x130/0x130 [sunrpc]
  [  +0,000006]  __rpc_execute+0x7a/0x3f0 [sunrpc]
  [  +0,000007]  rpc_async_schedule+0x12/0x20 [sunrpc]
  [  +0,000002]  process_one_work+0x1fd/0x400
  [  +0,000002]  worker_thread+0x34/0x410
  [  +0,000001]  kthread+0x121/0x140
  [  +0,000001]  ? process_one_work+0x400/0x400
  [  +0,000002]  ? kthread_park+0xb0/0xb0
  [  +0,000001]  ret_from_fork+0x22/0x40
  [  +0,000001] Modules linked in: nls_utf8 udf crc_itu_t rpcsec_gss_krb5 
auth_rpcgss nfsv4 nfs lockd grace fscache edac_mce_amd snd_hda_codec_hdmi kvm 
irqbypass joydev crct10dif_pclmul nvidia_uvm(OE) crc32_pclmul 
hid_roccat_koneplus nvidia_drm(POE) hid_roccat ghash_clmulni_intel 
hid_roccat_common nvidia_modeset(POE) nvidia(POE) snd_usb_audio 
snd_hda_codec_realtek
   snd_usbmidi_lib snd_hda_codec_generic ledtrig_audio snd_hda_codec_ca0132 
aesni_intel input_leds snd_hda_intel eeepc_wmi snd_hda_codec asus_wmi 
aes_x86_64 drm_kms_helper crypto_simd snd_hda_core snd_seq_midi cryptd 
sparse_keymap snd_hwdep snd_seq_midi_event video glue_helper wmi_bmof mxm_wmi 
serio_raw drm snd_rawmidi snd_pcm ipmi_devintf ipmi_msghandler snd_seq
  fb_sys_fops syscopyarea sysfillrect snd_seq_device sysimgblt snd_timer 
k10temp ccp snd soundcore mac_hid sch_fq_codel asus_wmi_sensors(OE) parport_pc 
ppdev sunrpc lp parport ip_tables x_tables autofs4 btrfs xor zstd_compress 
raid6_pq libcrc32c dm_mirror dm_region_hash dm_log
  [  +0,000019]  hid_plantronics hid_generic usbhid hid igb i2c_piix4 dca 
i2c_algo_bit ahci nvme libahci nvme_core wmi gpio_amdpt gpio_generic
  [  +0,000008] ---[ end trace 4314523bc923f697 ]---
  [  +0,000007] RIP: 0010:xdr_shrink_pagelen+0x9e/0xa0 [sunrpc]
  [  +0,000001] Code: 29 ea e8 85 f4 ff ff 44 8b 63 34 8b 43 3c 45 29 ec 44 29 
e8 3b 43 40 44 89 63 34 89 43 3c 73 03 89 43 40 5b 41 5c 41 5d 5d c3 <0f> 0b 0f 
1f 44 00 00 4c 8d 54 24 08 48 83 e4 f0 b9 04 00 00 00 41
  [  +0,000001] RSP: 0018:ffffa2dd18117b28 EFLAGS: 00010297
  [  +0,000001] RAX: 0000000000000010 RBX: 0000000000000070 RCX: 
000000000000001c
  [  +0,000001] RDX: 000000000000005c RSI: 0000000000000014 RDI: 
ffff8b96c0856650
  [  +0,000000] RBP: ffffa2dd18117b40 R08: ffff8b97d1f82e00 R09: 
0000000000000000
  [  +0,000001] R10: 1d1cc51b00000000 R11: ffff8b97cf00e520 R12: 
ffff8b96c0856650
  [  +0,000001] R13: ffffa2dd18117bf0 R14: 0000000000000058 R15: 
ffffffffc0eb8920
  [  +0,000001] FS:  0000000000000000(0000) GS:ffff8b97de700000(0000) 
knlGS:0000000000000000
  [  +0,000001] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [  +0,000000] CR2: 0000191e985bac88 CR3: 0000000fd656c000 CR4: 
0000000000340ee0
  
  .
  
  With a little compile-stress-test, I have tested the following kernels which 
seem to run fine:
   * 4.15.0-69
   * 4.15.0-70
   * 4.15.0-72
   * 5.0.0-32 (current daily driver, runs without a hassle, max test length 2d 
4h 33m - I am writing this bug report on it)
  
  But the following kernels do not run stable:
   * 5.0.0-35 (second stack-trace from above)
   * 5.0.0-37 (fist stack-trace from above, as you can see 49s after boot will 
already throw the error)
   * 5.3.0-24
  
  $ lspci | grep -i ether
  06:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network 
Connection (rev 03
  
  $ mount | grep filer
  filer:/ on /share type nfs4 
(rw,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,timeo=600,retrans=2,sec=krb5i,clientaddr=192.168.3.55,local_lock=none,addr=192.168.2.33)
  filer:/home/michael on /share/home/michael type nfs4 
(rw,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,timeo=600,retrans=2,sec=krb5i,clientaddr=192.168.3.55,local_lock=none,addr=192.168.2.33)
  
  $ cat /etc/fstab  | grep -i filer
  filer:/               /share/         nfs4 
nfsvers=4,sec=krb5i,rw,x-systemd.automount,soft,intr,tcp,noatime 0 0

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1858832

Title:
  invalid opcode xdr_buf_read_netobj on nfs4+krb5i directory

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Disco:
  In Progress

Bug description:
  == SRU Justification ==
  The xdr_shrink_pagelen() added in commit 5f1bc39 (SUNRPC: Fix buffer
  handling of GSS MIC without slack), which applied in the Disco tree via
  stable update process, sometimes will raise the following kernel trace
  when the bytes to remove from buf->pages is larger than buf->page_len:

  [ 49.420081] ------------[ cut here ]------------
  [ 49.420084] kernel BUG at 
/build/linux-hwe-FLYqTt/linux-hwe-5.0.0/net/sunrpc/xdr.c:434!
  [ 49.420092] invalid opcode: 0000 [#1] SMP NOPTI
  [ 49.420095] CPU: 16 PID: 469 Comm: kworker/u64:13 Tainted: P OE 
5.0.0-37-generic #40~18.04.1-Ubuntu
  [ 49.420096] Hardware name: System manufacturer System Product Name/ROG 
CROSSHAIR VII HERO (WI-FI), BIOS 3004 12/16/2019
  [ 49.420109] Workqueue: rpciod rpc_async_schedule [sunrpc]
  [ 49.420123] RIP: 0010:xdr_shrink_pagelen+0x9e/0xa0 [sunrpc]
  [ 49.420124] Code: 29 ea e8 85 f4 ff ff 44 8b 63 34 8b 43 3c 45 29 ec 44 29 
e8 3b 43 40 44 89 63 34 89 43 3c 73 03 89 43 40 5b 41 5c 41 5d 5d c3 <0f> 0b 0f 
1f 44 00 00 4c 8d 54 24 08 48 83 e4 f0 b9 04 00 00 00 41
  [ 49.420126] RSP: 0018:ffffb93787be7b38 EFLAGS: 00010287
  [ 49.420128] RAX: 000000000000000c RBX: 000000000000006c RCX: 000000000000001c
  [ 49.420129] RDX: 000000000000005c RSI: 0000000000000010 RDI: ffff8e1a87c56e50
  [ 49.420130] RBP: ffffb93787be7b50 R08: ffff8e1b06999700 R09: 0000000000000000
  [ 49.420131] R10: 00000000ffffffff R11: ffff8e1b0ecd1cd0 R12: ffff8e1a87c56e50
  [ 49.420132] R13: ffffb93787be7c00 R14: 0000000000000058 R15: ffffffffc228e8c0
  [ 49.420134] FS: 0000000000000000(0000) GS:ffff8e1b1ea00000(0000) 
knlGS:0000000000000000
  [ 49.420135] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 49.420136] CR2: 00007ffa1faeb000 CR3: 0000000f19abe000 CR4: 0000000000340ee0
  [ 49.420137] Call Trace:
  [ 49.420150] xdr_buf_read_netobj+0x122/0x180 [sunrpc]
  [ 49.420154] ? kzfree+0x2d/0x40
  [ 49.420158] ? crypto_destroy_tfm+0x73/0xb0
  [ 49.420162] gss_unwrap_resp_integ.isra.11+0x9c/0x100 [auth_rpcgss]
  [ 49.420164] ? gss_unwrap_resp_integ.isra.11+0x9c/0x100 [auth_rpcgss]
  [ 49.420167] gss_unwrap_resp+0x13c/0x280 [auth_rpcgss]
  [ 49.420170] ? gss_unwrap_resp+0x13c/0x280 [auth_rpcgss]
  [ 49.420172] ? gss_validate+0x242/0x300 [auth_rpcgss]
  [ 49.420184] ? nfs4_xdr_dec_readdir+0x100/0x100 [nfsv4]
  [ 49.420194] rpcauth_unwrap_resp+0x67/0xe0 [sunrpc]
  [ 49.420204] ? nfs4_xdr_dec_readdir+0x100/0x100 [nfsv4]
  [ 49.420213] call_decode+0x1c4/0x880 [sunrpc]
  [ 49.420216] ? __switch_to_asm+0x35/0x70
  [ 49.420224] ? rpc_check_timeout+0x130/0x130 [sunrpc]
  [ 49.420233] __rpc_execute+0x7a/0x3f0 [sunrpc]
  [ 49.420242] rpc_async_schedule+0x12/0x20 [sunrpc]
  [ 49.420245] process_one_work+0x1fd/0x400
  [ 49.420247] worker_thread+0x34/0x410
  [ 49.420249] kthread+0x121/0x140
  [ 49.420250] ? process_one_work+0x400/0x400
  [ 49.420252] ? kthread_park+0xb0/0xb0
  [ 49.420254] ret_from_fork+0x22/0x40

  == Fixes ==
  * e8d70b32 (SUNRPC: Fix another issue with MIC buffer space)
  Instead of calling BUG_ON, this patch will just cap the number of bytes
  that xdr_shrink_pagelen() will move.

  Only Disco kernel needs this patch, for Bionic and earlier they don't
  have 5f1bc39, and this fix has been applied to Eoan and onward.

  == Test ==
  Test kernel can be found here:
  https://people.canonical.com/~phlin/kernel/lp-1858832-sunrpc-bufferhandling/

  And it's been stress-tested by the bug reporter, Michael, this issue
  can no longer be reproduced.

  == Regression Potential ==
  Low. It's just changing the length of bytes to shrink, change limited
  to a single driver with positive test result.

  
  == Original Bug Report ==
  RELEASE=19.3
  CODENAME=tricia
  EDITION="Cinnamon"
  DESCRIPTION="Linux Mint 19.3 Tricia"
  DESKTOP=Gnome
  TOOLKIT=GTK
  NEW_FEATURES_URL=https://www.linuxmint.com/rel_tricia_cinnamon_whatsnew.php
  RELEASE_NOTES_URL=https://www.linuxmint.com/rel_tricia_cinnamon.php
  USER_GUIDE_URL=https://www.linuxmint.com/documentation.php
  GRUB_TITLE=Linux Mint 19.3 Cinnamon

  My home dir is mounted through nfs on a local server via nfs4 and krb5i.
  When stressing the mounted directory or its sub-directories (sometimes 
starting firefox, sometimes starting thunderbird, nearly guaranteed when 
compiling, sometimes the login itself), it will eventually lead to the 
following stack-trace. The corresponding process is then stuck and
  accessing the mounted directory (like calling ls) easily yields further and 
similar stack trace and causing the process to also stuck.

  Currently I am running an AMD 3950x on a ASUS Crosshair VII Hero Wifi
  (chipset x470), but I had the same issues with an Intel 6700K on a
  ASUS Crosshair VIII Hero in fall of 2019. I couldn't be bother back
  then to report the bug so I just kept running a working kernel
  (~5.0.0-15 I think) without updating it. After Christmas I updated
  said Intel machine with the AMD machine, re-installed Linux Mint,
  installed all updates and therefore ran into this issue again.

  [   49.420081] ------------[ cut here ]------------
  [   49.420084] kernel BUG at 
/build/linux-hwe-FLYqTt/linux-hwe-5.0.0/net/sunrpc/xdr.c:434!
  [   49.420092] invalid opcode: 0000 [#1] SMP NOPTI
  [   49.420095] CPU: 16 PID: 469 Comm: kworker/u64:13 Tainted: P           OE  
   5.0.0-37-generic #40~18.04.1-Ubuntu
  [   49.420096] Hardware name: System manufacturer System Product Name/ROG 
CROSSHAIR VII HERO (WI-FI), BIOS 3004 12/16/2019
  [   49.420109] Workqueue: rpciod rpc_async_schedule [sunrpc]
  [   49.420123] RIP: 0010:xdr_shrink_pagelen+0x9e/0xa0 [sunrpc]
  [   49.420124] Code: 29 ea e8 85 f4 ff ff 44 8b 63 34 8b 43 3c 45 29 ec 44 29 
e8 3b 43 40 44 89 63 34 89 43 3c 73 03 89 43 40 5b 41 5c 41 5d 5d c3 <0f> 0b 0f 
1f 44 00 00 4c 8d 54 24 08 48 83 e4 f0 b9 04 00 00 00 41
  [   49.420126] RSP: 0018:ffffb93787be7b38 EFLAGS: 00010287
  [   49.420128] RAX: 000000000000000c RBX: 000000000000006c RCX: 
000000000000001c
  [   49.420129] RDX: 000000000000005c RSI: 0000000000000010 RDI: 
ffff8e1a87c56e50
  [   49.420130] RBP: ffffb93787be7b50 R08: ffff8e1b06999700 R09: 
0000000000000000
  [   49.420131] R10: 00000000ffffffff R11: ffff8e1b0ecd1cd0 R12: 
ffff8e1a87c56e50
  [   49.420132] R13: ffffb93787be7c00 R14: 0000000000000058 R15: 
ffffffffc228e8c0
  [   49.420134] FS:  0000000000000000(0000) GS:ffff8e1b1ea00000(0000) 
knlGS:0000000000000000
  [   49.420135] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [   49.420136] CR2: 00007ffa1faeb000 CR3: 0000000f19abe000 CR4: 
0000000000340ee0
  [   49.420137] Call Trace:
  [   49.420150]  xdr_buf_read_netobj+0x122/0x180 [sunrpc]
  [   49.420154]  ? kzfree+0x2d/0x40
  [   49.420158]  ? crypto_destroy_tfm+0x73/0xb0
  [   49.420162]  gss_unwrap_resp_integ.isra.11+0x9c/0x100 [auth_rpcgss]
  [   49.420164]  ? gss_unwrap_resp_integ.isra.11+0x9c/0x100 [auth_rpcgss]
  [   49.420167]  gss_unwrap_resp+0x13c/0x280 [auth_rpcgss]
  [   49.420170]  ? gss_unwrap_resp+0x13c/0x280 [auth_rpcgss]
  [   49.420172]  ? gss_validate+0x242/0x300 [auth_rpcgss]
  [   49.420184]  ? nfs4_xdr_dec_readdir+0x100/0x100 [nfsv4]
  [   49.420194]  rpcauth_unwrap_resp+0x67/0xe0 [sunrpc]
  [   49.420204]  ? nfs4_xdr_dec_readdir+0x100/0x100 [nfsv4]
  [   49.420213]  call_decode+0x1c4/0x880 [sunrpc]
  [   49.420216]  ? __switch_to_asm+0x35/0x70
  [   49.420224]  ? rpc_check_timeout+0x130/0x130 [sunrpc]
  [   49.420233]  __rpc_execute+0x7a/0x3f0 [sunrpc]
  [   49.420242]  rpc_async_schedule+0x12/0x20 [sunrpc]
  [   49.420245]  process_one_work+0x1fd/0x400
  [   49.420247]  worker_thread+0x34/0x410
  [   49.420249]  kthread+0x121/0x140
  [   49.420250]  ? process_one_work+0x400/0x400
  [   49.420252]  ? kthread_park+0xb0/0xb0
  [   49.420254]  ret_from_fork+0x22/0x40
  [   49.420255] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd 
grace fscache edac_mce_amd snd_hda_codec_hdmi joydev kvm hid_roccat_koneplus 
hid_roccat irqbypass hid_roccat_common nvidia_uvm(OE) nvidia_drm(POE) 
nvidia_modeset(POE) snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio 
snd_hda_codec_ca0132 snd_hda_intel snd_usb_audio snd_hda_codec snd_usbmidi_lib 
snd_hda_core crct10dif_pclmul snd_hwdep crc32_pclmul snd_seq_midi snd_pcm 
nvidia(POE) ghash_clmulni_intel snd_seq_midi_event eeepc_wmi aesni_intel 
snd_rawmidi asus_wmi sparse_keymap aes_x86_64 crypto_simd cryptd video 
glue_helper snd_seq drm_kms_helper snd_seq_device mxm_wmi wmi_bmof input_leds 
drm snd_timer ipmi_devintf snd serio_raw ccp ipmi_msghandler fb_sys_fops 
syscopyarea sysfillrect sysimgblt soundcore k10temp mac_hid sch_fq_codel 
asus_wmi_sensors(OE) parport_pc sunrpc ppdev lp parport ip_tables x_tables 
autofs4 btrfs xor zstd_compress raid6_pq libcrc32c dm_mirror dm_region_hash 
dm_log hid_plantronics
  [   49.420282]  hid_generic usbhid hid igb i2c_piix4 nvme dca ahci 
i2c_algo_bit nvme_core libahci gpio_amdpt wmi gpio_generic
  [   49.420293] ---[ end trace 75bda976d7f1c02d ]---
  [   49.420305] RIP: 0010:xdr_shrink_pagelen+0x9e/0xa0 [sunrpc]
  [   49.420306] Code: 29 ea e8 85 f4 ff ff 44 8b 63 34 8b 43 3c 45 29 ec 44 29 
e8 3b 43 40 44 89 63 34 89 43 3c 73 03 89 43 40 5b 41 5c 41 5d 5d c3 <0f> 0b 0f 
1f 44 00 00 4c 8d 54 24 08 48 83 e4 f0 b9 04 00 00 00 41
  [   49.420307] RSP: 0018:ffffb93787be7b38 EFLAGS: 00010287
  [   49.420309] RAX: 000000000000000c RBX: 000000000000006c RCX: 
000000000000001c
  [   49.420310] RDX: 000000000000005c RSI: 0000000000000010 RDI: 
ffff8e1a87c56e50
  [   49.420311] RBP: ffffb93787be7b50 R08: ffff8e1b06999700 R09: 
0000000000000000
  [   49.420312] R10: 00000000ffffffff R11: ffff8e1b0ecd1cd0 R12: 
ffff8e1a87c56e50
  [   49.420312] R13: ffffb93787be7c00 R14: 0000000000000058 R15: 
ffffffffc228e8c0
  [   49.420314] FS:  0000000000000000(0000) GS:ffff8e1b1ea00000(0000) 
knlGS:0000000000000000
  [   49.420315] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [   49.420316] CR2: 00007ffa1faeb000 CR3: 0000000f19abe000 CR4: 
0000000000340ee0

  .

  [Jan 1 03:45] ------------[ cut here ]------------
  [  +0,000002] kernel BUG at 
/build/linux-hwe-W9CF8Q/linux-hwe-5.0.0/net/sunrpc/xdr.c:434!
  [  +0,000006] invalid opcode: 0000 [#1] SMP NOPTI
  [  +0,000002] CPU: 4 PID: 28219 Comm: kworker/u64:2 Tainted: P           OE   
  5.0.0-35-generic #38~18.04.1-Ubuntu
  [  +0,000001] Hardware name: System manufacturer System Product Name/ROG 
CROSSHAIR VII HERO (WI-FI), BIOS 3004 12/16/2019
  [  +0,000011] Workqueue: rpciod rpc_async_schedule [sunrpc]
  [  +0,000010] RIP: 0010:xdr_shrink_pagelen+0x9e/0xa0 [sunrpc]
  [  +0,000001] Code: 29 ea e8 85 f4 ff ff 44 8b 63 34 8b 43 3c 45 29 ec 44 29 
e8 3b 43 40 44 89 63 34 89 43 3c 73 03 89 43 40 5b 41 5c 41 5d 5d c3 <0f> 0b 0f 
1f 44 00 00 4c 8d 54 24 08 48 83 e4 f0 b9 04 00 00 00 41
  [  +0,000001] RSP: 0018:ffffa2dd18117b28 EFLAGS: 00010297
  [  +0,000001] RAX: 0000000000000010 RBX: 0000000000000070 RCX: 
000000000000001c
  [  +0,000001] RDX: 000000000000005c RSI: 0000000000000014 RDI: 
ffff8b96c0856650
  [  +0,000001] RBP: ffffa2dd18117b40 R08: ffff8b97d1f82e00 R09: 
0000000000000000
  [  +0,000000] R10: 1d1cc51b00000000 R11: ffff8b97cf00e520 R12: 
ffff8b96c0856650
  [  +0,000001] R13: ffffa2dd18117bf0 R14: 0000000000000058 R15: 
ffffffffc0eb8920
  [  +0,000001] FS:  0000000000000000(0000) GS:ffff8b97de700000(0000) 
knlGS:0000000000000000
  [  +0,000001] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [  +0,000001] CR2: 0000191e985bac88 CR3: 0000000fd656c000 CR4: 
0000000000340ee0
  [  +0,000001] Call Trace:
  [  +0,000009]  xdr_buf_read_netobj+0x122/0x180 [sunrpc]
  [  +0,000003]  ? kzfree+0x2d/0x40
  [  +0,000002]  ? crypto_destroy_tfm+0x73/0xb0
  [  +0,000003]  gss_unwrap_resp_integ.isra.11+0x9c/0x100 [auth_rpcgss]
  [  +0,000002]  ? gss_unwrap_resp_integ.isra.11+0x9c/0x100 [auth_rpcgss]
  [  +0,000002]  gss_unwrap_resp+0x13c/0x280 [auth_rpcgss]
  [  +0,000002]  ? kmem_cache_alloc_trace+0x42/0x1c0
  [  +0,000002]  ? gss_unwrap_resp+0x13c/0x280 [auth_rpcgss]
  [  +0,000002]  ? gss_validate+0x242/0x300 [auth_rpcgss]
  [  +0,000008]  ? nfs4_xdr_dec_readdir+0x100/0x100 [nfsv4]
  [  +0,000008]  rpcauth_unwrap_resp+0x67/0xe0 [sunrpc]
  [  +0,000007]  ? nfs4_xdr_dec_readdir+0x100/0x100 [nfsv4]
  [  +0,000007]  call_decode+0x166/0x8b0 [sunrpc]
  [  +0,000002]  ? __switch_to_asm+0x41/0x70
  [  +0,000006]  ? call_refreshresult+0x130/0x130 [sunrpc]
  [  +0,000006]  __rpc_execute+0x7a/0x3f0 [sunrpc]
  [  +0,000007]  rpc_async_schedule+0x12/0x20 [sunrpc]
  [  +0,000002]  process_one_work+0x1fd/0x400
  [  +0,000002]  worker_thread+0x34/0x410
  [  +0,000001]  kthread+0x121/0x140
  [  +0,000001]  ? process_one_work+0x400/0x400
  [  +0,000002]  ? kthread_park+0xb0/0xb0
  [  +0,000001]  ret_from_fork+0x22/0x40
  [  +0,000001] Modules linked in: nls_utf8 udf crc_itu_t rpcsec_gss_krb5 
auth_rpcgss nfsv4 nfs lockd grace fscache edac_mce_amd snd_hda_codec_hdmi kvm 
irqbypass joydev crct10dif_pclmul nvidia_uvm(OE) crc32_pclmul 
hid_roccat_koneplus nvidia_drm(POE) hid_roccat ghash_clmulni_intel 
hid_roccat_common nvidia_modeset(POE) nvidia(POE) snd_usb_audio 
snd_hda_codec_realtek
   snd_usbmidi_lib snd_hda_codec_generic ledtrig_audio snd_hda_codec_ca0132 
aesni_intel input_leds snd_hda_intel eeepc_wmi snd_hda_codec asus_wmi 
aes_x86_64 drm_kms_helper crypto_simd snd_hda_core snd_seq_midi cryptd 
sparse_keymap snd_hwdep snd_seq_midi_event video glue_helper wmi_bmof mxm_wmi 
serio_raw drm snd_rawmidi snd_pcm ipmi_devintf ipmi_msghandler snd_seq
  fb_sys_fops syscopyarea sysfillrect snd_seq_device sysimgblt snd_timer 
k10temp ccp snd soundcore mac_hid sch_fq_codel asus_wmi_sensors(OE) parport_pc 
ppdev sunrpc lp parport ip_tables x_tables autofs4 btrfs xor zstd_compress 
raid6_pq libcrc32c dm_mirror dm_region_hash dm_log
  [  +0,000019]  hid_plantronics hid_generic usbhid hid igb i2c_piix4 dca 
i2c_algo_bit ahci nvme libahci nvme_core wmi gpio_amdpt gpio_generic
  [  +0,000008] ---[ end trace 4314523bc923f697 ]---
  [  +0,000007] RIP: 0010:xdr_shrink_pagelen+0x9e/0xa0 [sunrpc]
  [  +0,000001] Code: 29 ea e8 85 f4 ff ff 44 8b 63 34 8b 43 3c 45 29 ec 44 29 
e8 3b 43 40 44 89 63 34 89 43 3c 73 03 89 43 40 5b 41 5c 41 5d 5d c3 <0f> 0b 0f 
1f 44 00 00 4c 8d 54 24 08 48 83 e4 f0 b9 04 00 00 00 41
  [  +0,000001] RSP: 0018:ffffa2dd18117b28 EFLAGS: 00010297
  [  +0,000001] RAX: 0000000000000010 RBX: 0000000000000070 RCX: 
000000000000001c
  [  +0,000001] RDX: 000000000000005c RSI: 0000000000000014 RDI: 
ffff8b96c0856650
  [  +0,000000] RBP: ffffa2dd18117b40 R08: ffff8b97d1f82e00 R09: 
0000000000000000
  [  +0,000001] R10: 1d1cc51b00000000 R11: ffff8b97cf00e520 R12: 
ffff8b96c0856650
  [  +0,000001] R13: ffffa2dd18117bf0 R14: 0000000000000058 R15: 
ffffffffc0eb8920
  [  +0,000001] FS:  0000000000000000(0000) GS:ffff8b97de700000(0000) 
knlGS:0000000000000000
  [  +0,000001] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [  +0,000000] CR2: 0000191e985bac88 CR3: 0000000fd656c000 CR4: 
0000000000340ee0

  .

  With a little compile-stress-test, I have tested the following kernels which 
seem to run fine:
   * 4.15.0-69
   * 4.15.0-70
   * 4.15.0-72
   * 5.0.0-32 (current daily driver, runs without a hassle, max test length 2d 
4h 33m - I am writing this bug report on it)

  But the following kernels do not run stable:
   * 5.0.0-35 (second stack-trace from above)
   * 5.0.0-37 (fist stack-trace from above, as you can see 49s after boot will 
already throw the error)
   * 5.3.0-24

  $ lspci | grep -i ether
  06:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network 
Connection (rev 03

  $ mount | grep filer
  filer:/ on /share type nfs4 
(rw,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,timeo=600,retrans=2,sec=krb5i,clientaddr=192.168.3.55,local_lock=none,addr=192.168.2.33)
  filer:/home/michael on /share/home/michael type nfs4 
(rw,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,timeo=600,retrans=2,sec=krb5i,clientaddr=192.168.3.55,local_lock=none,addr=192.168.2.33)

  $ cat /etc/fstab  | grep -i filer
  filer:/               /share/         nfs4 
nfsvers=4,sec=krb5i,rw,x-systemd.automount,soft,intr,tcp,noatime 0 0

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1858832/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to