Re: lockdep question (was Re: IPoIB caused a kernel: BUG: soft lockup detected on CPU#0!)
* Michael S. Tsirkin <[EMAIL PROTECTED]> wrote: > After adding some printks, I started getting these: > > [ 597.036720] BUG: MAX_STACK_TRACE_ENTRIES too low! this should go away if you double the size of MAX_STACK_TRACE_ENTRIES in kernel/lockdep_internals.h. (keep it a power of two) If it doesnt go away then it might signal some sort of leak. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: lockdep question (was Re: IPoIB caused a kernel: BUG: soft lockup detected on CPU#0!)
After adding some printks, I started getting these: [ 597.036720] BUG: MAX_STACK_TRACE_ENTRIES too low! [ 597.041546] turning off the locking correctness validator. [ 597.047135] [] save_trace+0x8a/0x8f [ 597.051751] [] mark_lock+0x65/0x3ff [ 597.056366] [] save_trace+0x3e/0x8f [ 597.060980] [] add_lock_to_list+0x62/0x85 [ 597.066116] [] __lock_acquire+0x3f4/0xbb9 [ 597.071252] [] send_mad+0x79/0x103 [ib_sa] [ 597.076474] [] idr_get_new_above_int+0x13c/0x216 [ 597.082225] [] lock_acquire+0x56/0x71 [ 597.087018] [] send_mad+0x79/0x103 [ib_sa] [ 597.092240] [] _spin_lock_irqsave+0x32/0x41 [ 597.097547] [] send_mad+0x79/0x103 [ib_sa] [ 597.102770] [] send_mad+0x79/0x103 [ib_sa] [ 597.107989] [] ib_sa_path_rec_get+0x134/0x172 [ib_sa] [ 597.114166] [] path_rec_start+0x115/0x143 [ib_ipoib] [ 597.120254] [] path_rec_completion+0x0/0x4f4 [ib_ipoib] [ 597.126610] [] path_rec_create+0x77/0x9d [ib_ipoib] [ 597.132617] [] ipoib_start_xmit+0x441/0x57b [ib_ipoib] [ 597.13] [] _spin_unlock_irqrestore+0x34/0x39 [ 597.144635] [] trace_hardirqs_on+0x106/0x141 [ 597.150035] [] dev_queue_xmit+0x109/0x245 [ 597.155167] [] __mod_timer+0x94/0x9e [ 597.159871] [] dev_hard_start_xmit+0x1be/0x21d [ 597.165438] [] __qdisc_run+0xd7/0x190 [ 597.170226] [] dev_queue_xmit+0x135/0x245 [ 597.175360] [] arp_process+0x2c0/0x512 [ 597.180234] [] mthca_tavor_interrupt+0xf3/0x12b [ib_mthca] [ 597.186855] [] netif_receive_skb+0x1c4/0x1da [ 597.192254] [] trace_hardirqs_on+0x106/0x141 [ 597.197648] [] process_backlog+0x94/0x107 [ 597.202785] [] net_rx_action+0x9a/0x15e [ 597.207743] [] __do_softirq+0x69/0xd5 [ 597.212530] [] do_softirq+0x37/0x4d [ 597.217147] [] do_IRQ+0x5c/0x72 [ 597.221415] [] common_interrupt+0x2e/0x34 [ 597.226549] [] default_idle+0x3b/0x54 [ 597.231337] [] default_idle+0x3d/0x54 [ 597.236124] [] cpu_idle+0xa2/0xbb [ 597.240567] === And sometimes these: [ 404.493572] KERNEL: assertion (!timer_pending(>watchdog_timer)) failed at net/sched/sch_generic.c (608) -- MST - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: lockdep question (was Re: IPoIB caused a kernel: BUG: soft lockup detected on CPU#0!)
On Sun, 2007-03-11 at 15:50 +0200, Michael S. Tsirkin wrote: > > Quoting Roland Dreier <[EMAIL PROTECTED]>: > > Subject: Re: IPoIB caused a kernel: BUG: soft lockup detected on CPU#0! > > > > >Feb 27 17:47:52 sw169 kernel: [] > > >_spin_lock_irqsave+0x15/0x24 > > >Feb 27 17:47:52 sw169 kernel: [] > > >:ib_ipoib:ipoib_neigh_destructor+0xc2/0x139 > > > > It looks like this is deadlocking trying to take priv->lock in > > ipoib_neigh_destructor(). > > One idea I just had would be to build a kernel with CONFIG_PROVE_LOCKING > > turned on, and then rerun this test. There's a good chance that this would > > diagnose the deadlock. (I don't have good access to my test machines right > > now, or > > else I would do it myself) > > OK, I did that. But I get > [13440.761857] INFO: trying to register non-static key. > [13440.766903] the code is fine but needs lockdep annotation. > [13440.772455] turning off the locking correctness validator. > and I am not sure what triggers this, or how to fix it to have the > validator actually do its job. It usually indicates a spinlock is not properly initialized. Like __SPIN_LOCK_UNLOCKED() used in a non-static context, use spin_lock_init() in these cases. However looking at the code, ipoib_neight_destructor only uses >lock, and that seems to get properly initialized in ipoib_setup() using spin_lock_init(). So either there are other sites that instanciate those objects and forget about the lock init, or the object is corrupted (use after free?) > Ingo, what key does the message refer to? > > The stack dump seems to point to drivers/infiniband/ulp/ipoib/ipoib_main.c > line > 829. > > Full message below: > > [13440.761857] INFO: trying to register non-static key. > [13440.766903] the code is fine but needs lockdep annotation. > [13440.772455] turning off the locking correctness validator. > [13440.778008] [] __lock_acquire+0xae4/0xbb9 > [13440.783078] [] lock_acquire+0x56/0x71 > [13440.787784] [] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib] > [13440.794412] [] _spin_lock_irqsave+0x32/0x41 > [13440.799649] [] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib] > [13440.806275] [] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib] > [13440.812897] [] dst_run_gc+0xc/0x118 > [13440.817439] [] run_timer_softirq+0x37/0x16b > [13440.822673] [] dst_run_gc+0x0/0x118 > [13440.827221] [] neigh_destroy+0xbe/0x104 > [13440.832114] [] dst_destroy+0x4d/0xab > [13440.836751] [] dst_run_gc+0x55/0x118 > [13440.841384] [] run_timer_softirq+0x108/0x16b > [13440.846711] [] __do_softirq+0x5a/0xd5 > [13440.851427] [] trace_hardirqs_on+0x106/0x141 > [13440.856754] [] __do_softirq+0x69/0xd5 > [13440.861470] [] do_softirq+0x37/0x4d > [13440.866016] [] smp_apic_timer_interrupt+0x6b/0x77 > [13440.871774] [] default_idle+0x3b/0x54 > [13440.876491] [] default_idle+0x3b/0x54 > [13440.881211] [] apic_timer_interrupt+0x33/0x38 > [13440.886624] [] default_idle+0x3b/0x54 > [13440.891342] [] default_idle+0x3d/0x54 > [13440.896061] [] cpu_idle+0xa2/0xbb > [13440.900436] === > [13768.711447] BUG: spinlock lockup on CPU#1, swapper/0, c0687880 > [13768.717353] [] _raw_spin_lock+0xda/0xfd > [13768.722247] [] _spin_lock_irqsave+0x39/0x41 > [13768.727486] [] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib] > [13768.734110] [] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib] > [13768.740735] [] dst_run_gc+0xc/0x118 > [13768.745276] [] run_timer_softirq+0x37/0x16b > [13768.750517] [] dst_run_gc+0x0/0x118 > [13768.755061] [] neigh_destroy+0xbe/0x104 > [13768.759955] [] dst_destroy+0x4d/0xab > [13768.764586] [] dst_run_gc+0x55/0x118 > [13768.769218] [] run_timer_softirq+0x108/0x16b > [13768.774542] [] __do_softirq+0x5a/0xd5 > [13768.779261] [] trace_hardirqs_on+0x106/0x141 > [13768.784588] [] __do_softirq+0x69/0xd5 > [13768.789308] [] do_softirq+0x37/0x4d > [13768.793851] [] smp_apic_timer_interrupt+0x6b/0x77 > [13768.799609] [] default_idle+0x3b/0x54 > [13768.804326] [] default_idle+0x3b/0x54 > [13768.809054] [] apic_timer_interrupt+0x33/0x38 > [13768.814471] [] default_idle+0x3b/0x54 > [13768.819187] [] default_idle+0x3d/0x54 > [13768.823903] [] cpu_idle+0xa2/0xbb > [13768.828279] === > > > -- > MST > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: lockdep question (was Re: IPoIB caused a kernel: BUG: soft lockup detected on CPU#0!)
> Quoting Michael S. Tsirkin <[EMAIL PROTECTED]>: > Subject: Re: lockdep question (was Re: IPoIB caused a kernel: BUG: soft > lockup detected on CPU#0!) > > > After adding some printks, I started getting these: > > [ 597.036720] BUG: MAX_STACK_TRACE_ENTRIES too low! > [ 597.041546] turning off the locking correctness validator. I looked at our stack usage a bit. It seems some work is in order. $ make checkstack | grep ib_ 0x0603 mthca_init_hca [ib_mthca]: 764 0x14ed mthca_init_hca [ib_mthca]: 764 0x65ae ipoib_cm_tx_start [ib_ipoib]:368 0x6b0b ipoib_cm_tx_start [ib_ipoib]:368 0x135f ib_uverbs_query_device [ib_uverbs]: 348 0x15f9 ib_uverbs_query_device [ib_uverbs]: 348 0x05d0 ib_ucm_init_qp_attr [ib_ucm]:300 0x0697 ib_ucm_init_qp_attr [ib_ucm]:300 0x7f9e ipoib_path_seq_show [ib_ipoib]: 264 0x8092 ipoib_path_seq_show [ib_ipoib]: 264 0x5b56 ipoib_cm_rx_handler [ib_ipoib]: 220 0x5eec ipoib_cm_rx_handler [ib_ipoib]: 220 0x7934 ipoib_cm_tx_handler [ib_ipoib]: 208 0x7ce0 ipoib_cm_tx_handler [ib_ipoib]: 208 0x32fe ib_uverbs_create_qp [ib_uverbs]: 192 0x36fd ib_uverbs_create_qp [ib_uverbs]: 192 0x28a9 srp_reset_host [ib_srp]: 192 0x2a96 srp_reset_host [ib_srp]: 192 0x1c99 show_sys_image_guid [ib_core]: 188 0x1d2b show_sys_image_guid [ib_core]: 188 0x01f9 ib_sa_service_rec_callback [ib_sa]: 180 0x0234 ib_sa_service_rec_callback [ib_sa]: 180 0x1b3c path_rec_completion [ib_ipoib]: 180 0x2020 path_rec_completion [ib_ipoib]: 180 0x70cf ipoib_cm_handle_rx_wc [ib_ipoib]:180 0x7402 ipoib_cm_handle_rx_wc [ib_ipoib]:180 0x09a7 srp_create_target [ib_srp]: 176 0x125f srp_create_target [ib_srp]: 176 0x0d9d ib_cm_listen [ib_cm]:172 0x10b3 ib_cm_listen [ib_cm]:172 0x4455 ipoib_mcast_send [ib_ipoib]: 172 0x48e0 ipoib_mcast_send [ib_ipoib]: 172 0x15c1 ipoib_start_xmit [ib_ipoib]: 164 0x1b2d ipoib_start_xmit [ib_ipoib]: 164 0x56c8 mthca_make_profile [ib_mthca]: 160 0x6051 mthca_make_profile [ib_mthca]: 160 0x2abb ipoib_ib_dev_stop [ib_ipoib]:160 0x2d19 ipoib_ib_dev_stop [ib_ipoib]:160 0x202b ib_uverbs_query_qp [ib_uverbs]: 156 0x22c0 ib_uverbs_query_qp [ib_uverbs]: 156 0x5269 ipoib_init_qp [ib_ipoib]:152 0x53bc ipoib_init_qp [ib_ipoib]:152 0x327f ipoib_mcast_join [ib_ipoib]: 144 0x349d ipoib_mcast_join [ib_ipoib]: 144 0x2092 ib_find_send_mad [ib_mad]: 140 0x23fa ib_find_send_mad [ib_mad]: 140 0x22cf ib_uverbs_modify_qp [ib_uverbs]: 140 0x24f2 ib_uverbs_modify_qp [ib_uverbs]: 140 0xbc8e mthca_modify_qp [ib_mthca]: 136 0xc9cc mthca_modify_qp [ib_mthca]: 136 0x00010cb1 mthca_reg_phys_mr [ib_mthca]:136 0x0001117a mthca_reg_phys_mr [ib_mthca]:136 0x35b4 ipoib_mcast_join_finish [ib_ipoib]: 136 0x3a33 ipoib_mcast_join_finish [ib_ipoib]: 136 0x0793 iser_cma_handler [ib_iser]: 132 0x0bc1 iser_cma_handler [ib_iser]: 132 0x1e37 srp_queuecommand [ib_srp]: 132 0x273b srp_queuecommand [ib_srp]: 132 0x8a5a mthca_poll_cq [ib_mthca]:128 0x9204 mthca_poll_cq [ib_mthca]:128 0x3a42 ipoib_mcast_join_complete [ib_ipoib]:128 0x3e6e ipoib_mcast_join_complete [ib_ipoib]:128 0x4a58 ipoib_mcast_restart_task [ib_ipoib]: 128 0x4eb8 ipoib_mcast_restart_task [ib_ipoib]: 128 0x38e6 ib_uverbs_create_ah [ib_uverbs]: 116 0x3ac4 ib_uverbs_create_ah [ib_uverbs]: 116 0xf6a5 mthca_process_mad [ib_mthca]:116 0xfa93 mthca_process_mad [ib_mthca]:116 0x11ef mcast_work_handler [ib_sa]: 112 0x16e6 mcast_work_handler [ib_sa]: 112 0x0a20 ib_ucm_send_req [ib_ucm]:112 0x0b7c ib_ucm_send_req [ib_ucm]:112 0x1697 ib_post_send_mad [ib_mad]: 112 0x1b05 ib_post_send_mad [ib_mad]: 112 0x030e iser_post_send [ib_iser]:112 0x03c5 iser_post_send
Re: lockdep question (was Re: IPoIB caused a kernel: BUG: soft lockup detected on CPU#0!)
Quoting Michael S. Tsirkin [EMAIL PROTECTED]: Subject: Re: lockdep question (was Re: IPoIB caused a kernel: BUG: soft lockup detected on CPU#0!) After adding some printks, I started getting these: [ 597.036720] BUG: MAX_STACK_TRACE_ENTRIES too low! [ 597.041546] turning off the locking correctness validator. I looked at our stack usage a bit. It seems some work is in order. $ make checkstack | grep ib_ 0x0603 mthca_init_hca [ib_mthca]: 764 0x14ed mthca_init_hca [ib_mthca]: 764 0x65ae ipoib_cm_tx_start [ib_ipoib]:368 0x6b0b ipoib_cm_tx_start [ib_ipoib]:368 0x135f ib_uverbs_query_device [ib_uverbs]: 348 0x15f9 ib_uverbs_query_device [ib_uverbs]: 348 0x05d0 ib_ucm_init_qp_attr [ib_ucm]:300 0x0697 ib_ucm_init_qp_attr [ib_ucm]:300 0x7f9e ipoib_path_seq_show [ib_ipoib]: 264 0x8092 ipoib_path_seq_show [ib_ipoib]: 264 0x5b56 ipoib_cm_rx_handler [ib_ipoib]: 220 0x5eec ipoib_cm_rx_handler [ib_ipoib]: 220 0x7934 ipoib_cm_tx_handler [ib_ipoib]: 208 0x7ce0 ipoib_cm_tx_handler [ib_ipoib]: 208 0x32fe ib_uverbs_create_qp [ib_uverbs]: 192 0x36fd ib_uverbs_create_qp [ib_uverbs]: 192 0x28a9 srp_reset_host [ib_srp]: 192 0x2a96 srp_reset_host [ib_srp]: 192 0x1c99 show_sys_image_guid [ib_core]: 188 0x1d2b show_sys_image_guid [ib_core]: 188 0x01f9 ib_sa_service_rec_callback [ib_sa]: 180 0x0234 ib_sa_service_rec_callback [ib_sa]: 180 0x1b3c path_rec_completion [ib_ipoib]: 180 0x2020 path_rec_completion [ib_ipoib]: 180 0x70cf ipoib_cm_handle_rx_wc [ib_ipoib]:180 0x7402 ipoib_cm_handle_rx_wc [ib_ipoib]:180 0x09a7 srp_create_target [ib_srp]: 176 0x125f srp_create_target [ib_srp]: 176 0x0d9d ib_cm_listen [ib_cm]:172 0x10b3 ib_cm_listen [ib_cm]:172 0x4455 ipoib_mcast_send [ib_ipoib]: 172 0x48e0 ipoib_mcast_send [ib_ipoib]: 172 0x15c1 ipoib_start_xmit [ib_ipoib]: 164 0x1b2d ipoib_start_xmit [ib_ipoib]: 164 0x56c8 mthca_make_profile [ib_mthca]: 160 0x6051 mthca_make_profile [ib_mthca]: 160 0x2abb ipoib_ib_dev_stop [ib_ipoib]:160 0x2d19 ipoib_ib_dev_stop [ib_ipoib]:160 0x202b ib_uverbs_query_qp [ib_uverbs]: 156 0x22c0 ib_uverbs_query_qp [ib_uverbs]: 156 0x5269 ipoib_init_qp [ib_ipoib]:152 0x53bc ipoib_init_qp [ib_ipoib]:152 0x327f ipoib_mcast_join [ib_ipoib]: 144 0x349d ipoib_mcast_join [ib_ipoib]: 144 0x2092 ib_find_send_mad [ib_mad]: 140 0x23fa ib_find_send_mad [ib_mad]: 140 0x22cf ib_uverbs_modify_qp [ib_uverbs]: 140 0x24f2 ib_uverbs_modify_qp [ib_uverbs]: 140 0xbc8e mthca_modify_qp [ib_mthca]: 136 0xc9cc mthca_modify_qp [ib_mthca]: 136 0x00010cb1 mthca_reg_phys_mr [ib_mthca]:136 0x0001117a mthca_reg_phys_mr [ib_mthca]:136 0x35b4 ipoib_mcast_join_finish [ib_ipoib]: 136 0x3a33 ipoib_mcast_join_finish [ib_ipoib]: 136 0x0793 iser_cma_handler [ib_iser]: 132 0x0bc1 iser_cma_handler [ib_iser]: 132 0x1e37 srp_queuecommand [ib_srp]: 132 0x273b srp_queuecommand [ib_srp]: 132 0x8a5a mthca_poll_cq [ib_mthca]:128 0x9204 mthca_poll_cq [ib_mthca]:128 0x3a42 ipoib_mcast_join_complete [ib_ipoib]:128 0x3e6e ipoib_mcast_join_complete [ib_ipoib]:128 0x4a58 ipoib_mcast_restart_task [ib_ipoib]: 128 0x4eb8 ipoib_mcast_restart_task [ib_ipoib]: 128 0x38e6 ib_uverbs_create_ah [ib_uverbs]: 116 0x3ac4 ib_uverbs_create_ah [ib_uverbs]: 116 0xf6a5 mthca_process_mad [ib_mthca]:116 0xfa93 mthca_process_mad [ib_mthca]:116 0x11ef mcast_work_handler [ib_sa]: 112 0x16e6 mcast_work_handler [ib_sa]: 112 0x0a20 ib_ucm_send_req [ib_ucm]:112 0x0b7c ib_ucm_send_req [ib_ucm]:112 0x1697 ib_post_send_mad [ib_mad]: 112 0x1b05 ib_post_send_mad [ib_mad]: 112 0x030e iser_post_send [ib_iser]:112 0x03c5 iser_post_send [ib_iser]:112 0x1605
Re: lockdep question (was Re: IPoIB caused a kernel: BUG: soft lockup detected on CPU#0!)
On Sun, 2007-03-11 at 15:50 +0200, Michael S. Tsirkin wrote: Quoting Roland Dreier [EMAIL PROTECTED]: Subject: Re: IPoIB caused a kernel: BUG: soft lockup detected on CPU#0! Feb 27 17:47:52 sw169 kernel: [8053aaf1] _spin_lock_irqsave+0x15/0x24 Feb 27 17:47:52 sw169 kernel: [88067a23] :ib_ipoib:ipoib_neigh_destructor+0xc2/0x139 It looks like this is deadlocking trying to take priv-lock in ipoib_neigh_destructor(). One idea I just had would be to build a kernel with CONFIG_PROVE_LOCKING turned on, and then rerun this test. There's a good chance that this would diagnose the deadlock. (I don't have good access to my test machines right now, or else I would do it myself) OK, I did that. But I get [13440.761857] INFO: trying to register non-static key. [13440.766903] the code is fine but needs lockdep annotation. [13440.772455] turning off the locking correctness validator. and I am not sure what triggers this, or how to fix it to have the validator actually do its job. It usually indicates a spinlock is not properly initialized. Like __SPIN_LOCK_UNLOCKED() used in a non-static context, use spin_lock_init() in these cases. However looking at the code, ipoib_neight_destructor only uses priv-lock, and that seems to get properly initialized in ipoib_setup() using spin_lock_init(). So either there are other sites that instanciate those objects and forget about the lock init, or the object is corrupted (use after free?) Ingo, what key does the message refer to? The stack dump seems to point to drivers/infiniband/ulp/ipoib/ipoib_main.c line 829. Full message below: [13440.761857] INFO: trying to register non-static key. [13440.766903] the code is fine but needs lockdep annotation. [13440.772455] turning off the locking correctness validator. [13440.778008] [c023c082] __lock_acquire+0xae4/0xbb9 [13440.783078] [c023c43d] lock_acquire+0x56/0x71 [13440.787784] [f899bff2] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib] [13440.794412] [c051ad41] _spin_lock_irqsave+0x32/0x41 [13440.799649] [f899bff2] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib] [13440.806275] [f899bff2] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib] [13440.812897] [c04a1c1b] dst_run_gc+0xc/0x118 [13440.817439] [c022af6e] run_timer_softirq+0x37/0x16b [13440.822673] [c04a1c0f] dst_run_gc+0x0/0x118 [13440.827221] [c04a3eab] neigh_destroy+0xbe/0x104 [13440.832114] [c04a1bb1] dst_destroy+0x4d/0xab [13440.836751] [c04a1c64] dst_run_gc+0x55/0x118 [13440.841384] [c022b03f] run_timer_softirq+0x108/0x16b [13440.846711] [c0227634] __do_softirq+0x5a/0xd5 [13440.851427] [c023b435] trace_hardirqs_on+0x106/0x141 [13440.856754] [c0227643] __do_softirq+0x69/0xd5 [13440.861470] [c02276e6] do_softirq+0x37/0x4d [13440.866016] [c02167b0] smp_apic_timer_interrupt+0x6b/0x77 [13440.871774] [c02029ef] default_idle+0x3b/0x54 [13440.876491] [c02029ef] default_idle+0x3b/0x54 [13440.881211] [c0204c33] apic_timer_interrupt+0x33/0x38 [13440.886624] [c02029ef] default_idle+0x3b/0x54 [13440.891342] [c02029f1] default_idle+0x3d/0x54 [13440.896061] [c0202aaa] cpu_idle+0xa2/0xbb [13440.900436] === [13768.711447] BUG: spinlock lockup on CPU#1, swapper/0, c0687880 [13768.717353] [c031f919] _raw_spin_lock+0xda/0xfd [13768.722247] [c051ad48] _spin_lock_irqsave+0x39/0x41 [13768.727486] [f899bff2] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib] [13768.734110] [f899bff2] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib] [13768.740735] [c04a1c1b] dst_run_gc+0xc/0x118 [13768.745276] [c022af6e] run_timer_softirq+0x37/0x16b [13768.750517] [c04a1c0f] dst_run_gc+0x0/0x118 [13768.755061] [c04a3eab] neigh_destroy+0xbe/0x104 [13768.759955] [c04a1bb1] dst_destroy+0x4d/0xab [13768.764586] [c04a1c64] dst_run_gc+0x55/0x118 [13768.769218] [c022b03f] run_timer_softirq+0x108/0x16b [13768.774542] [c0227634] __do_softirq+0x5a/0xd5 [13768.779261] [c023b435] trace_hardirqs_on+0x106/0x141 [13768.784588] [c0227643] __do_softirq+0x69/0xd5 [13768.789308] [c02276e6] do_softirq+0x37/0x4d [13768.793851] [c02167b0] smp_apic_timer_interrupt+0x6b/0x77 [13768.799609] [c02029ef] default_idle+0x3b/0x54 [13768.804326] [c02029ef] default_idle+0x3b/0x54 [13768.809054] [c0204c33] apic_timer_interrupt+0x33/0x38 [13768.814471] [c02029ef] default_idle+0x3b/0x54 [13768.819187] [c02029f1] default_idle+0x3d/0x54 [13768.823903] [c0202aaa] cpu_idle+0xa2/0xbb [13768.828279] === -- MST - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please
Re: lockdep question (was Re: IPoIB caused a kernel: BUG: soft lockup detected on CPU#0!)
After adding some printks, I started getting these: [ 597.036720] BUG: MAX_STACK_TRACE_ENTRIES too low! [ 597.041546] turning off the locking correctness validator. [ 597.047135] [c023a922] save_trace+0x8a/0x8f [ 597.051751] [c023ae8c] mark_lock+0x65/0x3ff [ 597.056366] [c023a8d6] save_trace+0x3e/0x8f [ 597.060980] [c023a9f0] add_lock_to_list+0x62/0x85 [ 597.066116] [c023b992] __lock_acquire+0x3f4/0xbb9 [ 597.071252] [f89da11f] send_mad+0x79/0x103 [ib_sa] [ 597.076474] [c031a475] idr_get_new_above_int+0x13c/0x216 [ 597.082225] [c023c43d] lock_acquire+0x56/0x71 [ 597.087018] [f89da11f] send_mad+0x79/0x103 [ib_sa] [ 597.092240] [c051ad41] _spin_lock_irqsave+0x32/0x41 [ 597.097547] [f89da11f] send_mad+0x79/0x103 [ib_sa] [ 597.102770] [f89da11f] send_mad+0x79/0x103 [ib_sa] [ 597.107989] [f89da8d9] ib_sa_path_rec_get+0x134/0x172 [ib_sa] [ 597.114166] [f899b73f] path_rec_start+0x115/0x143 [ib_ipoib] [ 597.120254] [f899cb38] path_rec_completion+0x0/0x4f4 [ib_ipoib] [ 597.126610] [f899b874] path_rec_create+0x77/0x9d [ib_ipoib] [ 597.132617] [f899c9fe] ipoib_start_xmit+0x441/0x57b [ib_ipoib] [ 597.13] [c051ae06] _spin_unlock_irqrestore+0x34/0x39 [ 597.144635] [c023b435] trace_hardirqs_on+0x106/0x141 [ 597.150035] [c04a058b] dev_queue_xmit+0x109/0x245 [ 597.155167] [c022ae27] __mod_timer+0x94/0x9e [ 597.159871] [c04a0423] dev_hard_start_xmit+0x1be/0x21d [ 597.165438] [c04a9fa9] __qdisc_run+0xd7/0x190 [ 597.170226] [c04a05b7] dev_queue_xmit+0x135/0x245 [ 597.175360] [c04ce267] arp_process+0x2c0/0x512 [ 597.180234] [f8954346] mthca_tavor_interrupt+0xf3/0x12b [ib_mthca] [ 597.186855] [c04a088b] netif_receive_skb+0x1c4/0x1da [ 597.192254] [c023b435] trace_hardirqs_on+0x106/0x141 [ 597.197648] [c04a0935] process_backlog+0x94/0x107 [ 597.202785] [c049f02b] net_rx_action+0x9a/0x15e [ 597.207743] [c0227643] __do_softirq+0x69/0xd5 [ 597.212530] [c02276e6] do_softirq+0x37/0x4d [ 597.217147] [c020617e] do_IRQ+0x5c/0x72 [ 597.221415] [c0204b52] common_interrupt+0x2e/0x34 [ 597.226549] [c02029ef] default_idle+0x3b/0x54 [ 597.231337] [c02029f1] default_idle+0x3d/0x54 [ 597.236124] [c0202aaa] cpu_idle+0xa2/0xbb [ 597.240567] === And sometimes these: [ 404.493572] KERNEL: assertion (!timer_pending(dev-watchdog_timer)) failed at net/sched/sch_generic.c (608) -- MST - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: lockdep question (was Re: IPoIB caused a kernel: BUG: soft lockup detected on CPU#0!)
* Michael S. Tsirkin [EMAIL PROTECTED] wrote: After adding some printks, I started getting these: [ 597.036720] BUG: MAX_STACK_TRACE_ENTRIES too low! this should go away if you double the size of MAX_STACK_TRACE_ENTRIES in kernel/lockdep_internals.h. (keep it a power of two) If it doesnt go away then it might signal some sort of leak. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/