Hi Shlomo & Or,

We've seen below neigh->list list corruption warning during testing,
 From Dongsu's and my opinion, several place also need
netif_tx_lock(_bh)/netif_tx_unlock(_bh) pairs around neigh->list , I
tried to add netif_tx_lock/netif_tx_unlock into ipoib_cm_destroy_tx, it
improved the situation, there're some other places in ipoib_main.c and
ipoib_mcast.c, but I don't know which lock should be added, if you can
take some time to look into it, that will be great.



May 17 15:17:57 ib2 kernel: [  274.910792] ib0: failed to send RTU: -22
May 17 15:17:59 ib2 kernel: [  276.118006] ib0: enabling connected mode
will cause multicast packet drops
May 17 15:18:01 ib2 kernel: [  278.557566] ib0: enabling connected mode
will cause multicast packet drops
May 17 15:18:02 ib2 kernel: [  279.793565] ib0: failed to send cm req: -22
May 17 15:18:02 ib2 kernel: [  279.793713] ------------[ cut here
]------------
May 17 15:18:02 ib2 kernel: [  279.793779] WARNING: at
lib/list_debug.c:49 __list_del_entry+0x63/0xd0()
May 17 15:18:02 ib2 kernel: [  279.793840] Hardware name: System Product
Name
May 17 15:18:02 ib2 kernel: [  279.793898] list_del corruption,
ffff8801f9708740->next is LIST_POISON1 (dead000000100100)
May 17 15:18:02 ib2 kernel: [  279.794013] Modules linked in: rdma_ucm
rdma_cm iw_cm ib_addr ib_ipoib ib_cm ib_sa ib_uverbs ib_umad mlx4_ib
ib_mad ib_core ip6table_filter ip6_tables iptable_filter ip_tables
ebtable_nat ebtables x_tables cpufreq_powersave cpufreq_conservative
cpufreq_stats cpufreq_userspace binfmt_misc fuse loop kvm_amd kvm
psmouse powernow_k8 tpm_tis tpm tpm_bios serio_raw edac_core mperf evdev
shpchp processor edac_mce_amd microcode pci_hotplug i2c_piix4
asus_atk0110 thermal_sys button dm_multipath scsi_dh mlx4_en sg sd_mod
crc_t10dif mlx4_core ahci libahci r8169 libata scsi_mod [last unloaded:
scsi_wait_scan]
May 17 15:18:02 ib2 kernel: [  279.796082] Pid: 220, comm: kworker/u:5
Not tainted 3.4.23-pserver+ #98
May 17 15:18:02 ib2 kernel: [  279.796142] Call Trace:
May 17 15:18:02 ib2 kernel: [  279.796202]  [<ffffffff8103c21f>]
warn_slowpath_common+0x7f/0xc0
May 17 15:18:02 ib2 kernel: [  279.796266]  [<ffffffff8103c316>]
warn_slowpath_fmt+0x46/0x50
May 17 15:18:02 ib2 kernel: [  279.796328]  [<ffffffff81428ff3>]
__list_del_entry+0x63/0xd0
May 17 15:18:02 ib2 kernel: [  279.796828]  [<ffffffff81429071>]
list_del+0x11/0x40
May 17 15:18:02 ib2 kernel: [  279.796897]  [<ffffffffa02b7978>]
ipoib_cm_tx_start+0x2e8/0x3b0 [ib_ipoib]
May 17 15:18:02 ib2 kernel: [  279.796964]  [<ffffffff8105da3a>]
process_one_work+0x19a/0x5c0
May 17 15:18:02 ib2 kernel: [  279.797026]  [<ffffffff8105d9cd>] ?
process_one_work+0x12d/0x5c0
May 17 15:18:02 ib2 kernel: [  279.797096]  [<ffffffffa02b7690>] ?
ipoib_cm_destroy_tx+0xc0/0xc0 [ib_ipoib]
May 17 15:18:02 ib2 kernel: [  279.797162]  [<ffffffff8105f7b5>]
worker_thread+0x175/0x380
May 17 15:18:02 ib2 kernel: [  279.797224]  [<ffffffff8105f640>] ?
manage_workers+0x210/0x210
May 17 15:18:02 ib2 kernel: [  279.797285]  [<ffffffff81064d5e>]
kthread+0xbe/0xd0
May 17 15:18:02 ib2 kernel: [  279.797346]  [<ffffffff8109f1d0>] ?
trace_hardirqs_on_caller+0x20/0x1b0
May 17 15:18:02 ib2 kernel: [  279.797412]  [<ffffffff81746b74>]
kernel_thread_helper+0x4/0x10
May 17 15:18:02 ib2 kernel: [  279.797475]  [<ffffffff8173ce70>] ?
retint_restore_args+0x13/0x13
May 17 15:18:02 ib2 kernel: [  279.797539]  [<ffffffff81064ca0>] ?
__init_kthread_worker+0x70/0x70
May 17 15:18:02 ib2 kernel: [  279.797602]  [<ffffffff81746b70>] ?
gs_change+0x13/0x13
May 17 15:18:02 ib2 kernel: [  279.797660] ---[ end trace
a513a4365628073c ]---
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to