Attempting to set an ipoib / partitioning bonding environment with
2.6.27-rc7 , I came a cross few ipoib crashes, eg these two oops
listings. I understand that some patches were sent by Yossi just
recently so they may help, or do they fall into the
non-regression-from-2.6.26 category?
Or.
this is seen on node startup
mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0 (April 4, 2008)
NET: Registered protocol family 10
lo: Disabled Privacy Extensions
ADDRCONF(NETDEV_UP): ib0.8003: link is not ready
------------[ cut here ]------------
kernel BUG at include/linux/netdevice.h:415!
invalid opcode: 0000 [1] SMP CPU 7
Modules linked in: rdma_ucm ib_ucm rdma_cm iw_cm ib_addr ib_ipoib ib_cm
ib_sa inet_lro ipv6 ib_uverbs ib_umad mlx4_ib ib_mthca ib_mad ib_core
dm_multipath battery ac floppy sr_mod joydev sg igb mlx4_core shpchp
button pcspkr rng_core dm_snapshot dm_zero dm_mirror dm_log dm_mod
usb_storage ata_piix libata sd_mod scsi_mod dock ext3 jbd ehci_hcd
ohci_hcd uhci_hcd [last unloaded: microcode]
Pid: 3035, comm: ipoib Not tainted 2.6.27-rc7 #2
RIP: 0010:[<ffffffffa01f364c>] [<ffffffffa01f364c>] ipoib_open+0x3c/0x150
[ib_ipoib]
RSP: 0018:ffff880229d15e90 EFLAGS: 00010246
RAX: ffff88021f00a878 RBX: ffff88021f00a7a0 RCX: 0000000000000000
RDX: 0003000600000000 RSI: ffff88022e029880 RDI: ffff88021f00a000
RBP: ffff88021f00a780 R08: 0000000000000000 R09: ffffffff805a8e40
R10: 0000000000000000 R11: 0000000000000003 R12: ffff88021f00a000
R13: ffffffffa01f4af2 R14: ffffffff805e32c0 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff88022f826580(0000)
knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000008cb170 CR3: 000000022e5d2000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ipoib (pid: 3035, threadinfo ffff880229d14000, task ffff88022e195f00)
Stack: ffff88021f00a878 ffff88022d02c780 ffff88021f00a870 ffffffff8023fd92
ffff88022c531d18 ffff88022d02c780 ffff88022d02c7a8 ffff88022c531d18
ffffffff805e0e80 ffffffff80240700 0000000000000000 ffff88022e195f00
Call Trace:
[<ffffffff8023fd92>] ? run_workqueue+0x88/0x118
[<ffffffff80240700>] ? worker_thread+0xd5/0xe0
[<ffffffff80242f41>] ? autoremove_wake_function+0x0/0x2e
[<ffffffff8024062b>] ? worker_thread+0x0/0xe0
[<ffffffff80242e38>] ? kthread+0x47/0x73
[<ffffffff8022d2e4>] ? schedule_tail+0x28/0x60
[<ffffffff8020c179>] ? child_rip+0xa/0x11
[<ffffffff80242df1>] ? kthread+0x0/0x73
[<ffffffff8020c16f>] ? child_rip+0x0/0x11
Code: 07 00 00 53 7e 12 48 8b 75 18 48 c7 c7 ff c5 1f a0 31 c0 e8 e7 eb 03
e0 41 f6 84 24 b0 07 00 00 01 49 8d 9c 24 a0 07 00 00 75 04 <0f> 0b eb fe
f0 80 63 10 fe f0 80 8d 80 00 00 00 04 4c 89 e7 e8
RIP [<ffffffffa01f364c>] ipoib_open+0x3c/0x150 [ib_ipoib]
RSP <ffff880229d15e90>
---[ end trace d51c7bec8b19b076 ]---
and this takes place when you attempt to take ib0 down in the presence
of child devices which are not running, if there are
no child devices it doesn't happen
ib0.8003: Failed to modify QP to ERROR state
BUG: soft lockup - CPU#0 stuck for 61s! [ifconfig:7481]
CPU 0:
Modules linked in: autofs4 sunrpc ib_iser iscsi_tcp libiscsi
scsi_transport_iscsi bonding rdma_ucm ib_ucm rdma_cm iw_cm ib_addr ib_ipoib
ib_cm ib_sa inet_lro ipv6 ib_uverbs ib_umad mlx4_ib ib_mthca ib_mad ib_core
dm_multipath battery ac floppy sr_mod igb joydev mlx4_core shpchp sg button
pcspkr rng_core dm_snapshot dm_zero dm_mirror dm_log dm_mod usb_storage
ata_piix libata sd_mod scsi_mod dock ext3 jbd ehci_hcd ohci_hcd uhci_hcd [last
unloaded: microcode]
Pid: 7481, comm: ifconfig Tainted: G D 2.6.27-rc7 #2
RIP: 0010:[<ffffffff80239a3e>] [<ffffffff80239a3e>] lock_timer_base+0x15/0x4b
RSP: 0018:ffff880213d75c28 EFLAGS: 00000246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000100
RDX: 0000000000001800 RSI: ffff880213d75c68 RDI: ffff880222cb94d0
RBP: ffff880222cb8000 R08: 0000000000000100 R09: ffff8800280bb900
R10: 0000000000000000 R11: ffffffff8031c680 R12: ffff880222cb8780
R13: ffff880222cb8780 R14: ffff880222cb87a0 R15: ffff88002805cf00
FS: 00007f7f380fc710(0000) GS:ffffffff805a9a80(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f52433af000 CR3: 000000021c5cf000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
[<ffffffff80239a8a>] ? try_to_del_timer_sync+0x16/0x5a
[<ffffffff80239ada>] ? del_timer_sync+0xc/0x16
[<ffffffffa01f44ed>] ? ipoib_ib_dev_stop+0x190/0x26d [ib_ipoib]
[<ffffffff80459c81>] ? _spin_lock_irqsave+0x9/0xe
[<ffffffff80239a4f>] ? lock_timer_base+0x26/0x4b
[<ffffffff8022ad25>] ? default_wake_function+0x0/0xe
[<ffffffff80459c69>] ? _spin_unlock_irq+0x9/0xc
[<ffffffffa01f23ca>] ? ipoib_flush_paths+0x13a/0x145 [ib_ipoib]
[<ffffffffa01f2ab0>] ? ipoib_stop+0x7e/0xf8 [ib_ipoib]
[<ffffffff803e5553>] ? dev_close+0x6f/0x87
[<ffffffff803e5261>] ? dev_change_flags+0xa6/0x15c
[<ffffffffa01f2aea>] ? ipoib_stop+0xb8/0xf8 [ib_ipoib]
[<ffffffff803e5553>] ? dev_close+0x6f/0x87
[<ffffffff803e5261>] ? dev_change_flags+0xa6/0x15c
[<ffffffff80424b68>] ? devinet_ioctl+0x242/0x58a
[<ffffffff803db45d>] ? sock_ioctl+0x1d2/0x1f9
[<ffffffff80291e31>] ? vfs_ioctl+0x21/0x6b
[<ffffffff802920d4>] ? do_vfs_ioctl+0x259/0x272
[<ffffffff8029213e>] ? sys_ioctl+0x51/0x73
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general