On Fri, Jul 10, 2009 at 10:42 PM, Roland Dreier <rdre...@cisco.com> wrote: > > > Thanks for the patch. With the patch applied the lockdep warning > > indeed occurs sooner and the output is now indeed shorter. You can > > find the new lockdep output here: > > http://bugzilla.kernel.org/attachment.cgi?id=22305. > > Thanks, that actually looks like a completely different issue (that I > can actually understand). I was able to reproduce that here: the issue > is doing skb_orphan() inside of priv->lock, and the network stack > locking is not irq-safe. So the following hacky patch fixes that. > > This would be a short-term solution for the immediate issue at least. A > better solution would be if we didn't need to make priv->lock > hardirq-safe: the only place that requires it is the QP event handler in > ipoib_cm.c, and that might be a little dicy to fix. Need to think about that. > > However with this patch applied I don't see any further lockdep reports > here. It would be great if you could retest yet again with this applied > (on top of my earlier patch to make priv->lock hardirq-safe as early as > possible).
Hello Roland, Sorry but I'm afraid that the two kernel patches posted in this thread are not sufficient to fix all outstanding locking issues in 2.6.30 IB subsystem. I encountered the following kernel messages today: OpenSM[8074]: SM port is down OpenSM[8074]: SM port is down OpenSM[8074]: SM port is down OpenSM[8074]: Entering MASTER state ib_srpt: ASYNC event= 17 on device= mlx4_0 ib_srpt: ASYNC event= 11 on device= mlx4_0 ib_srpt: ASYNC event= 9 on device= mlx4_0 OpenSM[8074]: SUBNET UP ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready ====================================================== [ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ] 2.6.30.3-scst-debug #1 ------------------------------------------------------ firefox/4069 [HC0[0]:SC1[2]:HE0:SE0] is trying to acquire: (&mad_agent_priv->lock){..-...}, at: [<ffffffffa04395e2>] ib_post_send_mad+0xe2/0x7d0 [ib_mad] and this task is already holding: (&priv->lock){-.-...}, at: [<ffffffffa047bb4d>] ipoib_path_lookup+0x4d/0x2f0 [ib_ipoib] which would create a new lock dependency: (&priv->lock){-.-...} -> (&mad_agent_priv->lock){..-...} but this new dependency connects a HARDIRQ-irq-safe lock: (&priv->lock){-.-...} ... which became HARDIRQ-irq-safe at: [<ffffffffffffffff>] 0xffffffffffffffff to a HARDIRQ-irq-unsafe lock: (&(&mad_agent_priv->timed_work)->timer){+.-...} ... which became HARDIRQ-irq-unsafe at: ... [<ffffffffffffffff>] 0xffffffffffffffff [ ... ] stack backtrace: Pid: 4069, comm: firefox Not tainted 2.6.30.3-scst-debug #1 Call Trace: <IRQ> [<ffffffff8027352a>] check_usage+0x3ba/0x470 [<ffffffff80273644>] check_irq_usage+0x64/0x100 [<ffffffff802746d9>] __lock_acquire+0xff9/0x1c80 [<ffffffff80275468>] lock_acquire+0x108/0x150 [<ffffffffa04395e2>] ? ib_post_send_mad+0xe2/0x7d0 [ib_mad] [<ffffffff80515061>] _spin_lock_irqsave+0x41/0x60 [<ffffffffa04395e2>] ? ib_post_send_mad+0xe2/0x7d0 [ib_mad] [<ffffffffa04395e2>] ib_post_send_mad+0xe2/0x7d0 [ib_mad] [<ffffffff8037c39c>] ? idr_get_new_above_int+0x1c/0x90 [<ffffffffa04659d4>] send_mad+0xb4/0x110 [ib_sa] [<ffffffffa04223ef>] ? ib_pack+0x17f/0x210 [ib_core] [<ffffffffa046613d>] ib_sa_path_rec_get+0x1ed/0x260 [ib_sa] [<ffffffffa047afa9>] path_rec_start+0x89/0xf0 [ib_ipoib] [<ffffffffa047bdf0>] ? path_rec_completion+0x0/0x540 [ib_ipoib] [<ffffffffa047bdc9>] ipoib_path_lookup+0x2c9/0x2f0 [ib_ipoib] [<ffffffffa047c5bd>] ipoib_start_xmit+0x17d/0x440 [ib_ipoib] [<ffffffff80488bfd>] dev_hard_start_xmit+0x2bd/0x340 [<ffffffff80488997>] ? dev_hard_start_xmit+0x57/0x340 [<ffffffff8049d4be>] __qdisc_run+0x25e/0x2b0 [<ffffffff804890a0>] dev_queue_xmit+0x2f0/0x4c0 [<ffffffff80488e02>] ? dev_queue_xmit+0x52/0x4c0 [<ffffffff8048f489>] neigh_connected_output+0xa9/0xe0 [<ffffffff804911f5>] neigh_update+0x265/0x510 [<ffffffff804909f9>] ? neigh_lookup+0x129/0x160 [<ffffffff804d4332>] arp_process+0x392/0x8c0 [<ffffffff804d3fa0>] ? arp_process+0x0/0x8c0 [<ffffffff802726bd>] ? trace_hardirqs_on_caller+0x6d/0x1a0 [<ffffffff804d4989>] arp_rcv+0x119/0x130 [<ffffffff80487892>] netif_receive_skb+0x392/0x4e0 [<ffffffff80487610>] ? netif_receive_skb+0x110/0x4e0 [<ffffffffa047dfd6>] ipoib_ib_handle_rx_wc+0x166/0x2a0 [ib_ipoib] [<ffffffffa047f771>] ipoib_poll+0x181/0x1e0 [ib_ipoib] [<ffffffff80485fda>] net_rx_action+0x17a/0x260 [<ffffffff80485f53>] ? net_rx_action+0xf3/0x260 [<ffffffff8024ef49>] ? __do_softirq+0x59/0x230 [<ffffffff8024efdf>] __do_softirq+0xef/0x230 [<ffffffff8020d0fc>] call_softirq+0x1c/0x30 [<ffffffff8020ee95>] do_softirq+0x75/0xb0 [<ffffffff8024eaa5>] irq_exit+0x95/0xa0 [<ffffffff8020e61d>] do_IRQ+0x8d/0xf0 [<ffffffff8020c913>] ret_from_intr+0x0/0xf <EOI> [<ffffffff80514cf1>] ? _spin_unlock_irq+0x31/0x60 [<ffffffff8023fcc9>] ? finish_task_switch+0x89/0x110 [<ffffffff8023fc86>] ? finish_task_switch+0x46/0x110 [<ffffffffa0438d00>] ? ib_mad_completion_handler+0x0/0x800 [ib_mad] [<ffffffff80511737>] ? thread_return+0x52/0x85b [<ffffffff8051472e>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff8027279d>] ? trace_hardirqs_on_caller+0x14d/0x1a0 [<ffffffff8051472e>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff80511f53>] ? schedule+0x13/0x40 [<ffffffff8020ca04>] ? retint_careful+0x12/0x2e ib0: no IPv6 routers present Bart. _______________________________________________ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general