William,
yes, I built a bionic kernel with this commit included and the bug was resolved.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1904848

Title:
  Ubuntu 18.04- call trace in kernel buffer when unloading ib_ipoib
  module

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Bionic:
  Fix Committed

Bug description:
  [Impact]
  unloading ib_ipoib causes a call trace to be logged in kernel buffer.

  bisecting the bionic kernel reveals that this issue was discovered by
  616e695435e3 workqueue: Try to catch flush_work() without INIT_WORK()
  in version 4.15.0-59.66

  [test case]

  # modprobe ib_ipoib
  # modprobe ib_ipoib -r
  # dmesg
  [  306.277717] ------------[ cut here ]------------
  [  306.277738] WARNING: CPU: 10 PID: 2148 at 
/build/linux-RJNBJC/linux-4.15.0/kernel/workqueue.c:2906 
__flush_work+0x1f8/0x210
  [  306.277739] Modules linked in: nfsv3 nfs fscache xt_CHECKSUM 
iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 
nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c 
ipt_REJECT nf_reject_ipv4 xt_tcpudp ebtable_filter ebtables ip6table_filter 
ip6_tables iptable_filter bridge stp llc binfmt_misc intel_rapl sb_edac 
x86_pkg_temp_thermal intel_powerclamp rpcrdma rdma_ucm ib_umad ib_uverbs 
coretemp ib_iser rdma_cm kvm_intel kvm iw_cm irqbypass ib_ipoib(-) libiscsi 
scsi_transport_iscsi ib_cm joydev input_leds crct10dif_pclmul crc32_pclmul 
mgag200 ttm drm_kms_helper drm hpilo ghash_clmulni_intel pcbc i2c_algo_bit 
ipmi_ssif fb_sys_fops syscopyarea sysfillrect sysimgblt aesni_intel aes_x86_64 
crypto_simd ioatdma glue_helper shpchp cryptd dca intel_cstate intel_rapl_perf
  [  306.277790]  serio_raw acpi_power_meter lpc_ich mac_hid ipmi_si 
ipmi_devintf ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd grace sunrpc 
sch_fq_codel ip_tables x_tables autofs4 mlx5_ib mlx4_ib mlx4_en ib_core 
hid_generic psmouse mlx5_core usbhid hid pata_acpi hpsa tg3 mlxfw mlx4_core 
scsi_transport_sas ptp pps_core devlink
  [  306.277817] CPU: 10 PID: 2148 Comm: modprobe Not tainted 
4.15.0-124-generic #127-Ubuntu
  [  306.277818] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 07/01/2015
  [  306.277823] RIP: 0010:__flush_work+0x1f8/0x210
  [  306.277825] RSP: 0018:ffffbdeb47ecfcd8 EFLAGS: 00010286
  [  306.277827] RAX: 0000000000000024 RBX: ffff993a5c3d8ec8 RCX: 
0000000000000006
  [  306.277829] RDX: 0000000000000000 RSI: ffff99429ef16498 RDI: 
ffff99429ef16490
  [  306.277830] RBP: ffffbdeb47ecfd48 R08: 000000000000050d R09: 
0000000000000004
  [  306.277832] R10: ffffe263a058c1c0 R11: 0000000000000001 R12: 
ffff993a5c3d8ec8
  [  306.277833] R13: 0000000000000001 R14: ffffbdeb47ecfd78 R15: 
ffffffffb00a9800
  [  306.277835] FS:  00007fa1124a9540(0000) GS:ffff99429ef00000(0000) 
knlGS:0000000000000000
  [  306.277837] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [  306.277839] CR2: 000055b1c5007bb0 CR3: 0000000fcf36c002 CR4: 
00000000001606e0
  [  306.277840] Call Trace:
  [  306.277850]  __cancel_work_timer+0x136/0x1b0
  [  306.277881]  ? mlx5_core_destroy_qp+0x99/0xd0 [mlx5_core]
  [  306.277886]  cancel_delayed_work_sync+0x13/0x20
  [  306.277909]  mlx5e_detach_netdev+0x83/0x90 [mlx5_core]
  [  306.277931]  mlx5_rdma_netdev_free+0x30/0x80 [mlx5_core]
  [  306.277941]  mlx5_ib_free_rdma_netdev+0xe/0x10 [mlx5_ib]
  [  306.277948]  ipoib_remove_one+0xe4/0x180 [ib_ipoib]
  [  306.277965]  ib_unregister_client+0x171/0x1e0 [ib_core]
  [  306.277972]  ipoib_cleanup_module+0x15/0x2f [ib_ipoib]
  [  306.277978]  SyS_delete_module+0x1ab/0x2d0
  [  306.277983]  do_syscall_64+0x73/0x130
  [  306.277989]  entry_SYSCALL_64_after_hwframe+0x41/0xa6
  [  306.277992] RIP: 0033:0x7fa111fc1047
  [  306.277993] RSP: 002b:00007ffc0db32298 EFLAGS: 00000206 ORIG_RAX: 
00000000000000b0
  [  306.277996] RAX: ffffffffffffffda RBX: 00005614be46cca0 RCX: 
00007fa111fc1047
  [  306.277997] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 
00005614be46cd08
  [  306.277999] RBP: 00005614be46cca0 R08: 00007ffc0db31241 R09: 
0000000000000000
  [  306.278000] R10: 00007fa11203dc40 R11: 0000000000000206 R12: 
00005614be46cd08
  [  306.278002] R13: 0000000000000001 R14: 00005614be46cd08 R15: 
00007ffc0db33680
  [  306.278004] Code: 24 03 80 c9 f0 e9 5b ff ff ff 48 c7 c7 18 50 0b b1 e8 ed 
66 04 00 0f 0b 31 c0 e9 75 ff ff ff 48 c7 c7 18 50 0b b1 e8 d8 66 04 00 <0f> 0b 
31 c0 e9 60 ff ff ff e8 5a 35 fe ff 66 2e 0f 1f 84 00 00
  [  306.278035] ---[ end trace 652f7759937172a2 ]---
  [  306.646061] ------------[ cut here ]------------
  [  306.646077] WARNING: CPU: 6 PID: 2148 at 
/build/linux-RJNBJC/linux-4.15.0/kernel/workqueue.c:2906 
__flush_work+0x1f8/0x210
  [  306.646078] Modules linked in: nfsv3 nfs fscache xt_CHECKSUM 
iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 
nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c 
ipt_REJECT nf_reject_ipv4 xt_tcpudp ebtable_filter ebtables ip6table_filter 
ip6_tables iptable_filter bridge stp llc binfmt_misc intel_rapl sb_edac 
x86_pkg_temp_thermal intel_powerclamp rpcrdma rdma_ucm ib_umad ib_uverbs 
coretemp ib_iser rdma_cm kvm_intel kvm iw_cm irqbypass ib_ipoib(-) libiscsi 
scsi_transport_iscsi ib_cm joydev input_leds crct10dif_pclmul crc32_pclmul 
mgag200 ttm drm_kms_helper drm hpilo ghash_clmulni_intel pcbc i2c_algo_bit 
ipmi_ssif fb_sys_fops syscopyarea sysfillrect sysimgblt aesni_intel aes_x86_64 
crypto_simd ioatdma glue_helper shpchp cryptd dca intel_cstate intel_rapl_perf
  [  306.646123]  serio_raw acpi_power_meter lpc_ich mac_hid ipmi_si 
ipmi_devintf ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd grace sunrpc 
sch_fq_codel ip_tables x_tables autofs4 mlx5_ib mlx4_ib mlx4_en ib_core 
hid_generic psmouse mlx5_core usbhid hid pata_acpi hpsa tg3 mlxfw mlx4_core 
scsi_transport_sas ptp pps_core devlink
  [  306.646146] CPU: 6 PID: 2148 Comm: modprobe Tainted: G        W        
4.15.0-124-generic #127-Ubuntu
  [  306.646148] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 07/01/2015
  [  306.646152] RIP: 0010:__flush_work+0x1f8/0x210
  [  306.646154] RSP: 0018:ffffbdeb47ecfcd8 EFLAGS: 00010286
  [  306.646156] RAX: 0000000000000024 RBX: ffff9942970b8ec8 RCX: 
0000000000000006
  [  306.646158] RDX: 0000000000000000 RSI: ffff99429ee16498 RDI: 
ffff99429ee16490
  [  306.646159] RBP: ffffbdeb47ecfd48 R08: 0000000000000533 R09: 
0000000000000004
  [  306.646161] R10: ffffe2639fa66740 R11: 0000000000000001 R12: 
ffff9942970b8ec8
  [  306.646162] R13: 0000000000000001 R14: ffffbdeb47ecfd78 R15: 
ffffffffb00a9800
  [  306.646164] FS:  00007fa1124a9540(0000) GS:ffff99429ee00000(0000) 
knlGS:0000000000000000
  [  306.646166] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [  306.646167] CR2: 000055dd889e4a30 CR3: 0000000fcf36c006 CR4: 
00000000001606e0
  [  306.646169] Call Trace:
  [  306.646177]  __cancel_work_timer+0x136/0x1b0
  [  306.646205]  ? mlx5_core_destroy_qp+0x99/0xd0 [mlx5_core]
  [  306.646210]  cancel_delayed_work_sync+0x13/0x20
  [  306.646233]  mlx5e_detach_netdev+0x83/0x90 [mlx5_core]
  [  306.646255]  mlx5_rdma_netdev_free+0x30/0x80 [mlx5_core]
  [  306.646264]  mlx5_ib_free_rdma_netdev+0xe/0x10 [mlx5_ib]
  [  306.646271]  ipoib_remove_one+0xe4/0x180 [ib_ipoib]
  [  306.646287]  ib_unregister_client+0x171/0x1e0 [ib_core]
  [  306.646295]  ipoib_cleanup_module+0x15/0x2f [ib_ipoib]
  [  306.646300]  SyS_delete_module+0x1ab/0x2d0
  [  306.646305]  do_syscall_64+0x73/0x130
  [  306.646310]  entry_SYSCALL_64_after_hwframe+0x41/0xa6
  [  306.646313] RIP: 0033:0x7fa111fc1047
  [  306.646314] RSP: 002b:00007ffc0db32298 EFLAGS: 00000206 ORIG_RAX: 
00000000000000b0
  [  306.646317] RAX: ffffffffffffffda RBX: 00005614be46cca0 RCX: 
00007fa111fc1047
  [  306.646318] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 
00005614be46cd08
  [  306.646319] RBP: 00005614be46cca0 R08: 00007ffc0db31241 R09: 
0000000000000000
  [  306.646321] R10: 00007fa11203dc40 R11: 0000000000000206 R12: 
00005614be46cd08
  [  306.646322] R13: 0000000000000001 R14: 00005614be46cd08 R15: 
00007ffc0db33680
  [  306.646325] Code: 24 03 80 c9 f0 e9 5b ff ff ff 48 c7 c7 18 50 0b b1 e8 ed 
66 04 00 0f 0b 31 c0 e9 75 ff ff ff 48 c7 c7 18 50 0b b1 e8 d8 66 04 00 <0f> 0b 
31 c0 e9 60 ff ff ff e8 5a 35 fe ff 66 2e 0f 1f 84 00 00
  [  306.646355] ---[ end trace 652f7759937172a3 ]---

  [Fix]
  the root cause for this error is canceling uninitialized delayed_work_queue 
belongs to ipoib net devices and the solution is not failing to initialize it.
  this solution is specified in the very small patched (one line) attached.
  please note that this patch is not upstream and it is based on the following 
upstream commits which introduced similar functionality to upstream v4.20-rc1.

  303211b44ce3 net/mlx5e: Always initialize update stats delayed work
  182570b26223 net/mlx5e: Gather common netdev init/cleanup functionality in 
one place

  applying this two on the bionic tree in a clean way requires more
  patches that might introduce a large change so I think it's better (if
  possible) to use the attached patch.

  [Regression Potential]
  Regression risk is low since it's introduce a small fix that was also 
accepted upstream in v4.20.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1904848/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to