I've seen the following crash a few times using iWARP on cxgb3.  The
crash is at the BUG() here:

        switch (cm_id_priv->state) {
        /* ... */
        default:
                BUG();
        }

I added a print to dump the state, and it was 2 (IW_CM_STATE_CONN_RECV)
both times I hit the crash after that.

the way I've been able to reproduce it is with some buggy code that
does an RDMA read into a memory region without remote write permission
(which works on IB but not iWARP).  The active side issues an RDMA
read, which fails and causes the connection to be torn down, and the
passive side crashes as below sometimes.

Anyone have any suggestions on how to track this down?

[11481.273827] ------------[ cut here ]------------
[11481.273827] kernel BUG at drivers/infiniband/core/iwcm.c:790!
[11481.273827] invalid opcode: 0000 [1] SMP
[11481.273827] CPU 3
[11481.273827] Modules linked in: rdma_ucm ib_uverbs nfs lockd nfs_acl fan ac 
battery ipv6 fuse dm_snapshot dm_mirror dm_mod iw_cxgb3 svcrdma xprtrdma sunrpc 
rdma_cm ib_cm iw_cm ib_sa ib_mad ib_addr loop ide_cd_mod cdrom ide_pci_generic 
piix iw_nes serio_raw ide_core cxgb3 ib_core evdev thermal pcspkr psmouse 
ipmi_si ipmi_msghandler ehci_hcd bnx2 zlib_inflate libcrc32c firmware_class 
container button uhci_hcd processor
[11481.273827] Pid: 2518, comm: iw_cm_wq Not tainted 2.6.25-rc7 #84
[11481.273827] RIP: 0010:[<ffffffff8816ae98>]  [<ffffffff8816ae98>] 
:iw_cm:cm_work_handler+0x36e/0x42a
[11481.273827] RSP: 0018:ffff8100798e9dc0  EFLAGS: 00010093
[11481.273827] RAX: 0000000000000002 RBX: 0000000000000246 RCX: ffffffff80622f04
[11481.273827] RDX: ffff810079520000 RSI: 0000000000000001 RDI: 0000000000000086
[11481.273828] RBP: ffff8100798e9e50 R08: ffff81007e588d08 R09: ffff81000100a030
[11481.273828] R10: 0000000000000001 R11: 00000000798e9a70 R12: ffff81007e588c00
[11481.273828] R13: 0000000000000000 R14: ffff81007e588d08 R15: ffff8100798e9de0
[11481.273828] FS:  0000000000000000(0000) GS:ffff81007fc02e00(0000) 
knlGS:0000000000000000
[11481.273828] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[11481.273828] CR2: 00007fed6a22e7c0 CR3: 0000000079596000 CR4: 00000000000006e0
[11481.273828] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[11481.273828] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[11481.273828] Process iw_cm_wq (pid: 2518, threadinfo ffff8100798e8000, task 
ffff810079520000)
[11481.273828] Stack:  ffff810079dec518 ffff81007e588cb8 ffff81007e588cf8 
ffff81007e588cf8
[11481.273828]  0000000000000005 0000000000000000 0000000000000000 
0000000000000000
[11481.273828]  0000000000000000 0000000000000000 0000000000000000 
0000000000000000
[11481.273828] Call Trace:
[11481.273828]  [<ffffffff8816ab2a>] ? :iw_cm:cm_work_handler+0x0/0x42a
[11481.273828]  [<ffffffff8023ec92>] run_workqueue+0xeb/0x200
[11481.273828]  [<ffffffff8023f847>] worker_thread+0xa4/0xb5
[11481.273828]  [<ffffffff80242517>] ? autoremove_wake_function+0x0/0x38
[11481.273828]  [<ffffffff8023f7a3>] ? worker_thread+0x0/0xb5
[11481.273828]  [<ffffffff802423f1>] kthread+0x49/0x78
[11481.273828]  [<ffffffff8020cf38>] child_rip+0xa/0x12
[11481.273828]  [<ffffffff8020c64f>] ? restore_args+0x0/0x30
[11481.273828]  [<ffffffff8024225c>] ? kthreadd+0x157/0x17c
[11481.273828]  [<ffffffff802423a8>] ? kthread+0x0/0x78
[11481.273828]  [<ffffffff8020cf2e>] ? child_rip+0x0/0x12
[11481.273828]
[11481.273828]
[11481.273828] Code: 4c 89 f7 41 c7 44 24 58 00 00 00 00 e8 38 a4 2c f8 4c 89 
fe 4c 89 e7 41 ff 14 24 4c 89 f7 41 89 c5 e8 6c a3 2c f8 48 89 c3 eb 07 <0f> 0b 
eb fe 45 31 ed 48 89 de 4c 89 f7 e8 0c a4 2c f8 eb 04 0f
[11481.273828] RIP  [<ffffffff8816ae98>] :iw_cm:cm_work_handler+0x36e/0x42a
[11481.273828]  RSP <ffff8100798e9dc0>
[11481.282720] ---[ end trace 2667de3bacfc5e84 ]---

_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to