On 25 July 2012 01:16, Joseph Glanville <[email protected]> wrote:
> Hi guys,
>
> I have been seeing this KP occur about every 3 days on our staging cluster.
> I am not exactly sure what the root cause would be.. I assume this
> would be a bug in SCST.
> The kernel is a 3.2.14 with Ubuntu patch series applied and Bart's SRP
> HA patches.
>
> The SRP connection settings are actually default at this stage we are
> only using the added ability to delete srp connections without unload.
>
> [35404.804901] IP: [<          (null)>]           (null)
> [35404.804981] PGD 2ab2b067 PUD 75f5b067 PMD 0
> [35404.805064] Oops: 0010 [#1] SMP
> [35404.805140] CPU 0
> [35404.805149] Modules linked in: tun xen_netback xen_blkback
> dm_round_robin ib_srpt(O) scst_vdisk(O) scst(O) bonding dm_multipath
> flashcache(O) raid0 raid1 md_mod
> [35404.805463]
> [35404.805528] Pid: 4585, comm: srpt_mlx4_0-2 Tainted: G           O
> 3.2.14+ #2 Dell                   PowerEdge C2100       /0P19C9
> [35404.805690] RIP: e030:[<0000000000000000>]  [<          (null)>]
>        (null)
> [35404.805832] RSP: e02b:ffff8800bf42ace0  EFLAGS: 00010046
> [35404.805910] RAX: ffff88001ac800c0 RBX: ffff88001ac0c4d0 RCX: 
> ffff88001ac0d600
> [35404.805994] RDX: ffff88001ac0dc30 RSI: ffff88001ac800c0 RDI: 
> ffff88001654e900
> [35404.806078] RBP: ffff8800bf42adb8 R08: ffff88001654e900 R09: 
> ffff88001ac0d608
> [35404.806162] R10: 0000000000000001 R11: ffff88001ac0d5f8 R12: 
> ffff88009c443940
> [35404.806263] R13: ffff88001b1a2000 R14: 00000000000004c8 R15: 
> ffff88001ac0c4d0
> [35404.806350] FS:  00007f2701406700(0000) GS:ffff8800bf427000(0000)
> knlGS:0000000000000000
> [35404.806492] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> [35404.806571] CR2: 0000000000000000 CR3: 00000000830e0000 CR4: 
> 0000000000002660
> [35404.806655] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
> 0000000000000000
> [35404.806740] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
> 0000000000000400
> [35404.806825] Process srpt_mlx4_0-2 (pid: 4585, threadinfo
> ffff8800b50f4000, task ffff880017faeea0)
> [35404.806969] Stack:
> [35404.807034]  ffffffff8150285e 0000000000000000 ffff88001ac0c998
> ffff880000000001
> [35404.807183]  ffff88001ac0d608 ffff88001654e900 ffff88001ac0e3f0
> ffff88001ac0e3b8
> [35404.807332]  ffff88001ac0c528 ffff880068a11600 ffff88001ac0c4e0
> ffff88001ac0c4f0
> [35404.807480] Call Trace:
> [35404.807548]  <IRQ>
> [35404.807637]  [<ffffffff8150285e>] ? srp_recv_completion+0x44e/0x650
> [35404.807722]  [<ffffffff81009f52>] ? check_events+0x12/0x20
> [35404.807803]  [<ffffffff814ea3c2>] mlx4_ib_cq_comp+0x12/0x20
> [35404.807883]  [<ffffffff81433beb>] mlx4_cq_completion+0x3b/0x80
> [35404.807964]  [<ffffffff81434aa4>] mlx4_eq_int+0x224/0x290
> [35404.808043]  [<ffffffff81434b81>] mlx4_interrupt+0x51/0x80
> [35404.808125]  [<ffffffff810b72bd>] handle_irq_event_percpu+0x5d/0x210
> [35404.808208]  [<ffffffff810b74bc>] handle_irq_event+0x4c/0x80
> [35404.808289]  [<ffffffff810ba233>] handle_fasteoi_irq+0x83/0x140
> [35404.808371]  [<ffffffff8130f756>] __xen_evtchn_do_upcall+0x1a6/0x260
> [35404.808455]  [<ffffffff813114fa>] xen_evtchn_do_upcall+0x2a/0x40
> [35404.808538]  [<ffffffff816846fe>] xen_do_hypervisor_callback+0x1e/0x30
> [35404.808620]  <EOI>
> [35404.808691]  [<ffffffffa006b0e7>] ?
> scst_register_virtual_device+0x5d7/0x750 [scst]
> [35404.808833]  [<ffffffffa007a473>] ? scst_cmd_init_done+0xb3/0x5a0 [scst]
> [35404.808917]  [<ffffffffa00f0bed>] ? 0xffffffffa00f0bec
> [35404.809006]  [<ffffffffa0072a47>] ? scst_rx_cmd+0xe7/0xce0 [scst]
> [35404.809088]  [<ffffffffa00f2872>] ? 0xffffffffa00f2871
> [35404.809166]  [<ffffffffa00f08e3>] ? 0xffffffffa00f08e2
> [35404.809245]  [<ffffffffa00f797f>] ? 0xffffffffa00f797e
> [35404.810244]  [<ffffffffa00f081f>] ? 0xffffffffa00f081e
> [35404.810323]  [<ffffffffa00f7af0>] ? 0xffffffffa00f7aef
> [35404.810417]  [<ffffffffa00f873f>] ? 0xffffffffa00f873e
> [35404.810495]  [<ffffffffa00f87a0>] ? 0xffffffffa00f879f
> [35404.810573]  [<ffffffffa00f8880>] ? 0xffffffffa00f887f
> [35404.810655]  [<ffffffff8167a8d9>] ? _raw_spin_unlock_irqrestore+0x19/0x20
> [35404.810739]  [<ffffffffa00f87a0>] ? 0xffffffffa00f879f
> [35404.810818]  [<ffffffff81077246>] ? kthread+0x96/0xa0
> [35404.810896]  [<ffffffff816845b4>] ? kernel_thread_helper+0x4/0x10
> [35404.810979]  [<ffffffff81682673>] ? int_ret_from_sys_call+0x7/0x1b
> [35404.811061]  [<ffffffff8167ab7c>] ? retint_restore_args+0x5/0x6
> [35404.811142]  [<ffffffff816845b0>] ? gs_change+0x13/0x13
> [35404.811219] Code:  Bad RIP value.
> [35404.811297] RIP  [<          (null)>]           (null)
> [35404.811377]  RSP <ffff8800bf42ace0>
> [35404.811447] CR2: 0000000000000000
> [35404.811739] ---[ end trace a002a9122b31526a ]---
> [35404.811841] Kernel panic - not syncing: Fatal exception in interrupt
> [35404.811950] Pid: 4585, comm: srpt_mlx4_0-2 Tainted: G      D    O 3.2.14+ 
> #2
> [35404.812061] Call Trace:
> [35404.812155]  <IRQ>  [<ffffffff81677b48>] panic+0x8c/0x19d
> [35404.812296]  [<ffffffff81009f52>] ? check_events+0x12/0x20
> [35404.812402]  [<ffffffff8167b7fa>] oops_end+0xea/0xf0
> [35404.812510]  [<ffffffff8103b5f2>] no_context+0xf2/0x270
> [35404.812616]  [<ffffffff8103b895>] __bad_area_nosemaphore+0x125/0x210
> [35404.812726]  [<ffffffff8103b98e>] bad_area_nosemaphore+0xe/0x10
> [35404.812835]  [<ffffffff8167e135>] do_page_fault+0x335/0x4d0
> [35404.812942]  [<ffffffff8100984d>] ? xen_force_evtchn_callback+0xd/0x10
> [35404.813052]  [<ffffffff81009f52>] ? check_events+0x12/0x20
> [35404.813174]  [<ffffffff8167adf5>] page_fault+0x25/0x30
> [35404.813280]  [<ffffffff8150285e>] ? srp_recv_completion+0x44e/0x650
> [35404.813390]  [<ffffffff81009f52>] ? check_events+0x12/0x20
> [35404.813496]  [<ffffffff814ea3c2>] mlx4_ib_cq_comp+0x12/0x20
> [35404.813603]  [<ffffffff81433beb>] mlx4_cq_completion+0x3b/0x80
> [35404.813711]  [<ffffffff81434aa4>] mlx4_eq_int+0x224/0x290
> [35404.813817]  [<ffffffff81434b81>] mlx4_interrupt+0x51/0x80
> [35404.813924]  [<ffffffff810b72bd>] handle_irq_event_percpu+0x5d/0x210
> [35404.814034]  [<ffffffff810b74bc>] handle_irq_event+0x4c/0x80
> [35404.814141]  [<ffffffff810ba233>] handle_fasteoi_irq+0x83/0x140
> [35404.814250]  [<ffffffff8130f756>] __xen_evtchn_do_upcall+0x1a6/0x260
> [35404.814360]  [<ffffffff813114fa>] xen_evtchn_do_upcall+0x2a/0x40
> [35404.814469]  [<ffffffff816846fe>] xen_do_hypervisor_callback+0x1e/0x30
> [35404.814600]  <EOI>  [<ffffffffa006b0e7>] ?
> scst_register_virtual_device+0x5d7/0x750 [scst]
> [35404.814806]  [<ffffffffa007a473>] ? scst_cmd_init_done+0xb3/0x5a0 [scst]
> [35404.814916]  [<ffffffffa00f0bed>] ? 0xffffffffa00f0bec
> [35404.815022]  [<ffffffffa0072a47>] ? scst_rx_cmd+0xe7/0xce0 [scst]
> [35404.815131]  [<ffffffffa00f2872>] ? 0xffffffffa00f2871
> [35404.815236]  [<ffffffffa00f08e3>] ? 0xffffffffa00f08e2
> [35404.815341]  [<ffffffffa00f797f>] ? 0xffffffffa00f797e
> [35404.815447]  [<ffffffffa00f081f>] ? 0xffffffffa00f081e
> [35404.815551]  [<ffffffffa00f7af0>] ? 0xffffffffa00f7aef
> [35404.815656]  [<ffffffffa00f873f>] ? 0xffffffffa00f873e
> [35404.815761]  [<ffffffffa00f87a0>] ? 0xffffffffa00f879f
> [35404.815869]  [<ffffffffa00f8880>] ? 0xffffffffa00f887f
> [35404.815985]  [<ffffffff8167a8d9>] ? _raw_spin_unlock_irqrestore+0x19/0x20
> [35404.816096]  [<ffffffffa00f87a0>] ? 0xffffffffa00f879f
> [35404.816201]  [<ffffffff81077246>] ? kthread+0x96/0xa0
> [35404.816306]  [<ffffffff816845b4>] ? kernel_thread_helper+0x4/0x10
> [35404.816414]  [<ffffffff81682673>] ? int_ret_from_sys_call+0x7/0x1b
> [35404.816523]  [<ffffffff8167ab7c>] ? retint_restore_args+0x5/0x6
> [35404.816631]  [<ffffffff816845b0>] ? gs_change+0x13/0x13
>
> Joseph.
>
> --
> CTO | Orion Virtualisation Solutions | www.orionvm.com.au
> Phone: 1300 56 99 52 | Mobile: 0428 754 846

Sorry I missed the first line. :(

[35404.804723] BUG: unable to handle kernel NULL pointer dereference
at           (null)

-- 
CTO | Orion Virtualisation Solutions | www.orionvm.com.au
Phone: 1300 56 99 52 | Mobile: 0428 754 846
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to