Re: lockdep question (was Re: IPoIB caused a kernel: BUG: soft lockup detected on CPU#0!)

2007-03-11 Thread Ingo Molnar

* Michael S. Tsirkin <[EMAIL PROTECTED]> wrote:

> After adding some printks, I started getting these:
> 
> [  597.036720] BUG: MAX_STACK_TRACE_ENTRIES too low!

this should go away if you double the size of MAX_STACK_TRACE_ENTRIES in 
kernel/lockdep_internals.h. (keep it a power of two) If it doesnt go 
away then it might signal some sort of leak.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: lockdep question (was Re: IPoIB caused a kernel: BUG: soft lockup detected on CPU#0!)

2007-03-11 Thread Michael S. Tsirkin

After adding some printks, I started getting these:

[  597.036720] BUG: MAX_STACK_TRACE_ENTRIES too low!
[  597.041546] turning off the locking correctness validator.
[  597.047135]  [] save_trace+0x8a/0x8f
[  597.051751]  [] mark_lock+0x65/0x3ff
[  597.056366]  [] save_trace+0x3e/0x8f
[  597.060980]  [] add_lock_to_list+0x62/0x85
[  597.066116]  [] __lock_acquire+0x3f4/0xbb9
[  597.071252]  [] send_mad+0x79/0x103 [ib_sa]
[  597.076474]  [] idr_get_new_above_int+0x13c/0x216
[  597.082225]  [] lock_acquire+0x56/0x71
[  597.087018]  [] send_mad+0x79/0x103 [ib_sa]
[  597.092240]  [] _spin_lock_irqsave+0x32/0x41
[  597.097547]  [] send_mad+0x79/0x103 [ib_sa]
[  597.102770]  [] send_mad+0x79/0x103 [ib_sa]
[  597.107989]  [] ib_sa_path_rec_get+0x134/0x172 [ib_sa]
[  597.114166]  [] path_rec_start+0x115/0x143 [ib_ipoib]
[  597.120254]  [] path_rec_completion+0x0/0x4f4 [ib_ipoib]
[  597.126610]  [] path_rec_create+0x77/0x9d [ib_ipoib]
[  597.132617]  [] ipoib_start_xmit+0x441/0x57b [ib_ipoib]
[  597.13]  [] _spin_unlock_irqrestore+0x34/0x39
[  597.144635]  [] trace_hardirqs_on+0x106/0x141
[  597.150035]  [] dev_queue_xmit+0x109/0x245
[  597.155167]  [] __mod_timer+0x94/0x9e
[  597.159871]  [] dev_hard_start_xmit+0x1be/0x21d
[  597.165438]  [] __qdisc_run+0xd7/0x190
[  597.170226]  [] dev_queue_xmit+0x135/0x245
[  597.175360]  [] arp_process+0x2c0/0x512
[  597.180234]  [] mthca_tavor_interrupt+0xf3/0x12b [ib_mthca]
[  597.186855]  [] netif_receive_skb+0x1c4/0x1da
[  597.192254]  [] trace_hardirqs_on+0x106/0x141
[  597.197648]  [] process_backlog+0x94/0x107
[  597.202785]  [] net_rx_action+0x9a/0x15e
[  597.207743]  [] __do_softirq+0x69/0xd5
[  597.212530]  [] do_softirq+0x37/0x4d
[  597.217147]  [] do_IRQ+0x5c/0x72
[  597.221415]  [] common_interrupt+0x2e/0x34
[  597.226549]  [] default_idle+0x3b/0x54
[  597.231337]  [] default_idle+0x3d/0x54
[  597.236124]  [] cpu_idle+0xa2/0xbb
[  597.240567]  ===

And sometimes these:

[  404.493572] KERNEL: assertion (!timer_pending(>watchdog_timer)) failed 
at net/sched/sch_generic.c (608)

-- 
MST
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: lockdep question (was Re: IPoIB caused a kernel: BUG: soft lockup detected on CPU#0!)

2007-03-11 Thread Peter Zijlstra
On Sun, 2007-03-11 at 15:50 +0200, Michael S. Tsirkin wrote:
> > Quoting Roland Dreier <[EMAIL PROTECTED]>:
> > Subject: Re: IPoIB caused a kernel: BUG: soft lockup detected on CPU#0!
> > 
> > >Feb 27 17:47:52 sw169 kernel:  [] 
> > >_spin_lock_irqsave+0x15/0x24
> > >Feb 27 17:47:52 sw169 kernel:  [] 
> > >:ib_ipoib:ipoib_neigh_destructor+0xc2/0x139
> > 
> > It looks like this is deadlocking trying to take priv->lock in 
> > ipoib_neigh_destructor().
> > One idea I just had would be to build a kernel with CONFIG_PROVE_LOCKING 
> > turned on, and then rerun this test.  There's a good chance that this would
> > diagnose the deadlock.  (I don't have good access to my test machines right 
> > now, or
> > else I would do it myself)
> 
> OK, I did that. But I get
>   [13440.761857] INFO: trying to register non-static key.
>   [13440.766903] the code is fine but needs lockdep annotation.
>   [13440.772455] turning off the locking correctness validator.
> and I am not sure what triggers this, or how to fix it to have the
> validator actually do its job.

It usually indicates a spinlock is not properly initialized. Like
__SPIN_LOCK_UNLOCKED() used in a non-static context, use
spin_lock_init() in these cases.

However looking at the code, ipoib_neight_destructor only uses
>lock, and that seems to get properly initialized in ipoib_setup()
using spin_lock_init().

So either there are other sites that instanciate those objects and
forget about the lock init, or the object is corrupted (use after free?)

> Ingo, what key does the message refer to?
> 
> The stack dump seems to point to drivers/infiniband/ulp/ipoib/ipoib_main.c 
> line
> 829.
> 
> Full message below:
>   
> [13440.761857] INFO: trying to register non-static key.
> [13440.766903] the code is fine but needs lockdep annotation.
> [13440.772455] turning off the locking correctness validator.
> [13440.778008]  [] __lock_acquire+0xae4/0xbb9
> [13440.783078]  [] lock_acquire+0x56/0x71
> [13440.787784]  [] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib]
> [13440.794412]  [] _spin_lock_irqsave+0x32/0x41
> [13440.799649]  [] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib]
> [13440.806275]  [] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib]
> [13440.812897]  [] dst_run_gc+0xc/0x118
> [13440.817439]  [] run_timer_softirq+0x37/0x16b
> [13440.822673]  [] dst_run_gc+0x0/0x118
> [13440.827221]  [] neigh_destroy+0xbe/0x104
> [13440.832114]  [] dst_destroy+0x4d/0xab
> [13440.836751]  [] dst_run_gc+0x55/0x118
> [13440.841384]  [] run_timer_softirq+0x108/0x16b
> [13440.846711]  [] __do_softirq+0x5a/0xd5
> [13440.851427]  [] trace_hardirqs_on+0x106/0x141
> [13440.856754]  [] __do_softirq+0x69/0xd5
> [13440.861470]  [] do_softirq+0x37/0x4d
> [13440.866016]  [] smp_apic_timer_interrupt+0x6b/0x77
> [13440.871774]  [] default_idle+0x3b/0x54
> [13440.876491]  [] default_idle+0x3b/0x54
> [13440.881211]  [] apic_timer_interrupt+0x33/0x38
> [13440.886624]  [] default_idle+0x3b/0x54
> [13440.891342]  [] default_idle+0x3d/0x54
> [13440.896061]  [] cpu_idle+0xa2/0xbb
> [13440.900436]  ===
> [13768.711447] BUG: spinlock lockup on CPU#1, swapper/0, c0687880
> [13768.717353]  [] _raw_spin_lock+0xda/0xfd
> [13768.722247]  [] _spin_lock_irqsave+0x39/0x41
> [13768.727486]  [] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib]
> [13768.734110]  [] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib]
> [13768.740735]  [] dst_run_gc+0xc/0x118
> [13768.745276]  [] run_timer_softirq+0x37/0x16b
> [13768.750517]  [] dst_run_gc+0x0/0x118
> [13768.755061]  [] neigh_destroy+0xbe/0x104
> [13768.759955]  [] dst_destroy+0x4d/0xab
> [13768.764586]  [] dst_run_gc+0x55/0x118
> [13768.769218]  [] run_timer_softirq+0x108/0x16b
> [13768.774542]  [] __do_softirq+0x5a/0xd5
> [13768.779261]  [] trace_hardirqs_on+0x106/0x141
> [13768.784588]  [] __do_softirq+0x69/0xd5
> [13768.789308]  [] do_softirq+0x37/0x4d
> [13768.793851]  [] smp_apic_timer_interrupt+0x6b/0x77
> [13768.799609]  [] default_idle+0x3b/0x54
> [13768.804326]  [] default_idle+0x3b/0x54
> [13768.809054]  [] apic_timer_interrupt+0x33/0x38
> [13768.814471]  [] default_idle+0x3b/0x54
> [13768.819187]  [] default_idle+0x3d/0x54
> [13768.823903]  [] cpu_idle+0xa2/0xbb
> [13768.828279]  ===
> 
> 
> -- 
> MST
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: lockdep question (was Re: IPoIB caused a kernel: BUG: soft lockup detected on CPU#0!)

2007-03-11 Thread Michael S. Tsirkin
> Quoting Michael S. Tsirkin <[EMAIL PROTECTED]>:
> Subject: Re: lockdep question (was Re: IPoIB caused a kernel: BUG: soft 
> lockup detected on CPU#0!)
> 
> 
> After adding some printks, I started getting these:
> 
> [  597.036720] BUG: MAX_STACK_TRACE_ENTRIES too low!
> [  597.041546] turning off the locking correctness validator.

I looked at our stack usage a bit. It seems some work is in order.

$ make checkstack | grep ib_
0x0603 mthca_init_hca [ib_mthca]:   764
0x14ed mthca_init_hca [ib_mthca]:   764
0x65ae ipoib_cm_tx_start [ib_ipoib]:368
0x6b0b ipoib_cm_tx_start [ib_ipoib]:368
0x135f ib_uverbs_query_device [ib_uverbs]:  348
0x15f9 ib_uverbs_query_device [ib_uverbs]:  348
0x05d0 ib_ucm_init_qp_attr [ib_ucm]:300
0x0697 ib_ucm_init_qp_attr [ib_ucm]:300
0x7f9e ipoib_path_seq_show [ib_ipoib]:  264
0x8092 ipoib_path_seq_show [ib_ipoib]:  264
0x5b56 ipoib_cm_rx_handler [ib_ipoib]:  220
0x5eec ipoib_cm_rx_handler [ib_ipoib]:  220
0x7934 ipoib_cm_tx_handler [ib_ipoib]:  208
0x7ce0 ipoib_cm_tx_handler [ib_ipoib]:  208
0x32fe ib_uverbs_create_qp [ib_uverbs]: 192
0x36fd ib_uverbs_create_qp [ib_uverbs]: 192
0x28a9 srp_reset_host [ib_srp]: 192
0x2a96 srp_reset_host [ib_srp]: 192
0x1c99 show_sys_image_guid [ib_core]:   188
0x1d2b show_sys_image_guid [ib_core]:   188
0x01f9 ib_sa_service_rec_callback [ib_sa]:  180
0x0234 ib_sa_service_rec_callback [ib_sa]:  180
0x1b3c path_rec_completion [ib_ipoib]:  180
0x2020 path_rec_completion [ib_ipoib]:  180
0x70cf ipoib_cm_handle_rx_wc [ib_ipoib]:180
0x7402 ipoib_cm_handle_rx_wc [ib_ipoib]:180
0x09a7 srp_create_target [ib_srp]:  176
0x125f srp_create_target [ib_srp]:  176
0x0d9d ib_cm_listen [ib_cm]:172
0x10b3 ib_cm_listen [ib_cm]:172
0x4455 ipoib_mcast_send [ib_ipoib]: 172
0x48e0 ipoib_mcast_send [ib_ipoib]: 172
0x15c1 ipoib_start_xmit [ib_ipoib]: 164
0x1b2d ipoib_start_xmit [ib_ipoib]: 164
0x56c8 mthca_make_profile [ib_mthca]:   160
0x6051 mthca_make_profile [ib_mthca]:   160
0x2abb ipoib_ib_dev_stop [ib_ipoib]:160
0x2d19 ipoib_ib_dev_stop [ib_ipoib]:160
0x202b ib_uverbs_query_qp [ib_uverbs]:  156
0x22c0 ib_uverbs_query_qp [ib_uverbs]:  156
0x5269 ipoib_init_qp [ib_ipoib]:152
0x53bc ipoib_init_qp [ib_ipoib]:152
0x327f ipoib_mcast_join [ib_ipoib]: 144
0x349d ipoib_mcast_join [ib_ipoib]: 144
0x2092 ib_find_send_mad [ib_mad]:   140
0x23fa ib_find_send_mad [ib_mad]:   140
0x22cf ib_uverbs_modify_qp [ib_uverbs]: 140
0x24f2 ib_uverbs_modify_qp [ib_uverbs]: 140
0xbc8e mthca_modify_qp [ib_mthca]:  136
0xc9cc mthca_modify_qp [ib_mthca]:  136
0x00010cb1 mthca_reg_phys_mr [ib_mthca]:136
0x0001117a mthca_reg_phys_mr [ib_mthca]:136
0x35b4 ipoib_mcast_join_finish [ib_ipoib]:  136
0x3a33 ipoib_mcast_join_finish [ib_ipoib]:  136
0x0793 iser_cma_handler [ib_iser]:  132
0x0bc1 iser_cma_handler [ib_iser]:  132
0x1e37 srp_queuecommand [ib_srp]:   132
0x273b srp_queuecommand [ib_srp]:   132
0x8a5a mthca_poll_cq [ib_mthca]:128
0x9204 mthca_poll_cq [ib_mthca]:128
0x3a42 ipoib_mcast_join_complete [ib_ipoib]:128
0x3e6e ipoib_mcast_join_complete [ib_ipoib]:128
0x4a58 ipoib_mcast_restart_task [ib_ipoib]: 128
0x4eb8 ipoib_mcast_restart_task [ib_ipoib]: 128
0x38e6 ib_uverbs_create_ah [ib_uverbs]: 116
0x3ac4 ib_uverbs_create_ah [ib_uverbs]: 116
0xf6a5 mthca_process_mad [ib_mthca]:116
0xfa93 mthca_process_mad [ib_mthca]:116
0x11ef mcast_work_handler [ib_sa]:  112
0x16e6 mcast_work_handler [ib_sa]:  112
0x0a20 ib_ucm_send_req [ib_ucm]:112
0x0b7c ib_ucm_send_req [ib_ucm]:112
0x1697 ib_post_send_mad [ib_mad]:   112
0x1b05 ib_post_send_mad [ib_mad]:   112
0x030e iser_post_send [ib_iser]:112
0x03c5 iser_post_send 

Re: lockdep question (was Re: IPoIB caused a kernel: BUG: soft lockup detected on CPU#0!)

2007-03-11 Thread Michael S. Tsirkin
 Quoting Michael S. Tsirkin [EMAIL PROTECTED]:
 Subject: Re: lockdep question (was Re: IPoIB caused a kernel: BUG: soft 
 lockup detected on CPU#0!)
 
 
 After adding some printks, I started getting these:
 
 [  597.036720] BUG: MAX_STACK_TRACE_ENTRIES too low!
 [  597.041546] turning off the locking correctness validator.

I looked at our stack usage a bit. It seems some work is in order.

$ make checkstack | grep ib_
0x0603 mthca_init_hca [ib_mthca]:   764
0x14ed mthca_init_hca [ib_mthca]:   764
0x65ae ipoib_cm_tx_start [ib_ipoib]:368
0x6b0b ipoib_cm_tx_start [ib_ipoib]:368
0x135f ib_uverbs_query_device [ib_uverbs]:  348
0x15f9 ib_uverbs_query_device [ib_uverbs]:  348
0x05d0 ib_ucm_init_qp_attr [ib_ucm]:300
0x0697 ib_ucm_init_qp_attr [ib_ucm]:300
0x7f9e ipoib_path_seq_show [ib_ipoib]:  264
0x8092 ipoib_path_seq_show [ib_ipoib]:  264
0x5b56 ipoib_cm_rx_handler [ib_ipoib]:  220
0x5eec ipoib_cm_rx_handler [ib_ipoib]:  220
0x7934 ipoib_cm_tx_handler [ib_ipoib]:  208
0x7ce0 ipoib_cm_tx_handler [ib_ipoib]:  208
0x32fe ib_uverbs_create_qp [ib_uverbs]: 192
0x36fd ib_uverbs_create_qp [ib_uverbs]: 192
0x28a9 srp_reset_host [ib_srp]: 192
0x2a96 srp_reset_host [ib_srp]: 192
0x1c99 show_sys_image_guid [ib_core]:   188
0x1d2b show_sys_image_guid [ib_core]:   188
0x01f9 ib_sa_service_rec_callback [ib_sa]:  180
0x0234 ib_sa_service_rec_callback [ib_sa]:  180
0x1b3c path_rec_completion [ib_ipoib]:  180
0x2020 path_rec_completion [ib_ipoib]:  180
0x70cf ipoib_cm_handle_rx_wc [ib_ipoib]:180
0x7402 ipoib_cm_handle_rx_wc [ib_ipoib]:180
0x09a7 srp_create_target [ib_srp]:  176
0x125f srp_create_target [ib_srp]:  176
0x0d9d ib_cm_listen [ib_cm]:172
0x10b3 ib_cm_listen [ib_cm]:172
0x4455 ipoib_mcast_send [ib_ipoib]: 172
0x48e0 ipoib_mcast_send [ib_ipoib]: 172
0x15c1 ipoib_start_xmit [ib_ipoib]: 164
0x1b2d ipoib_start_xmit [ib_ipoib]: 164
0x56c8 mthca_make_profile [ib_mthca]:   160
0x6051 mthca_make_profile [ib_mthca]:   160
0x2abb ipoib_ib_dev_stop [ib_ipoib]:160
0x2d19 ipoib_ib_dev_stop [ib_ipoib]:160
0x202b ib_uverbs_query_qp [ib_uverbs]:  156
0x22c0 ib_uverbs_query_qp [ib_uverbs]:  156
0x5269 ipoib_init_qp [ib_ipoib]:152
0x53bc ipoib_init_qp [ib_ipoib]:152
0x327f ipoib_mcast_join [ib_ipoib]: 144
0x349d ipoib_mcast_join [ib_ipoib]: 144
0x2092 ib_find_send_mad [ib_mad]:   140
0x23fa ib_find_send_mad [ib_mad]:   140
0x22cf ib_uverbs_modify_qp [ib_uverbs]: 140
0x24f2 ib_uverbs_modify_qp [ib_uverbs]: 140
0xbc8e mthca_modify_qp [ib_mthca]:  136
0xc9cc mthca_modify_qp [ib_mthca]:  136
0x00010cb1 mthca_reg_phys_mr [ib_mthca]:136
0x0001117a mthca_reg_phys_mr [ib_mthca]:136
0x35b4 ipoib_mcast_join_finish [ib_ipoib]:  136
0x3a33 ipoib_mcast_join_finish [ib_ipoib]:  136
0x0793 iser_cma_handler [ib_iser]:  132
0x0bc1 iser_cma_handler [ib_iser]:  132
0x1e37 srp_queuecommand [ib_srp]:   132
0x273b srp_queuecommand [ib_srp]:   132
0x8a5a mthca_poll_cq [ib_mthca]:128
0x9204 mthca_poll_cq [ib_mthca]:128
0x3a42 ipoib_mcast_join_complete [ib_ipoib]:128
0x3e6e ipoib_mcast_join_complete [ib_ipoib]:128
0x4a58 ipoib_mcast_restart_task [ib_ipoib]: 128
0x4eb8 ipoib_mcast_restart_task [ib_ipoib]: 128
0x38e6 ib_uverbs_create_ah [ib_uverbs]: 116
0x3ac4 ib_uverbs_create_ah [ib_uverbs]: 116
0xf6a5 mthca_process_mad [ib_mthca]:116
0xfa93 mthca_process_mad [ib_mthca]:116
0x11ef mcast_work_handler [ib_sa]:  112
0x16e6 mcast_work_handler [ib_sa]:  112
0x0a20 ib_ucm_send_req [ib_ucm]:112
0x0b7c ib_ucm_send_req [ib_ucm]:112
0x1697 ib_post_send_mad [ib_mad]:   112
0x1b05 ib_post_send_mad [ib_mad]:   112
0x030e iser_post_send [ib_iser]:112
0x03c5 iser_post_send [ib_iser]:112
0x1605

Re: lockdep question (was Re: IPoIB caused a kernel: BUG: soft lockup detected on CPU#0!)

2007-03-11 Thread Peter Zijlstra
On Sun, 2007-03-11 at 15:50 +0200, Michael S. Tsirkin wrote:
  Quoting Roland Dreier [EMAIL PROTECTED]:
  Subject: Re: IPoIB caused a kernel: BUG: soft lockup detected on CPU#0!
  
  Feb 27 17:47:52 sw169 kernel:  [8053aaf1] 
  _spin_lock_irqsave+0x15/0x24
  Feb 27 17:47:52 sw169 kernel:  [88067a23] 
  :ib_ipoib:ipoib_neigh_destructor+0xc2/0x139
  
  It looks like this is deadlocking trying to take priv-lock in 
  ipoib_neigh_destructor().
  One idea I just had would be to build a kernel with CONFIG_PROVE_LOCKING 
  turned on, and then rerun this test.  There's a good chance that this would
  diagnose the deadlock.  (I don't have good access to my test machines right 
  now, or
  else I would do it myself)
 
 OK, I did that. But I get
   [13440.761857] INFO: trying to register non-static key.
   [13440.766903] the code is fine but needs lockdep annotation.
   [13440.772455] turning off the locking correctness validator.
 and I am not sure what triggers this, or how to fix it to have the
 validator actually do its job.

It usually indicates a spinlock is not properly initialized. Like
__SPIN_LOCK_UNLOCKED() used in a non-static context, use
spin_lock_init() in these cases.

However looking at the code, ipoib_neight_destructor only uses
priv-lock, and that seems to get properly initialized in ipoib_setup()
using spin_lock_init().

So either there are other sites that instanciate those objects and
forget about the lock init, or the object is corrupted (use after free?)

 Ingo, what key does the message refer to?
 
 The stack dump seems to point to drivers/infiniband/ulp/ipoib/ipoib_main.c 
 line
 829.
 
 Full message below:
   
 [13440.761857] INFO: trying to register non-static key.
 [13440.766903] the code is fine but needs lockdep annotation.
 [13440.772455] turning off the locking correctness validator.
 [13440.778008]  [c023c082] __lock_acquire+0xae4/0xbb9
 [13440.783078]  [c023c43d] lock_acquire+0x56/0x71
 [13440.787784]  [f899bff2] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib]
 [13440.794412]  [c051ad41] _spin_lock_irqsave+0x32/0x41
 [13440.799649]  [f899bff2] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib]
 [13440.806275]  [f899bff2] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib]
 [13440.812897]  [c04a1c1b] dst_run_gc+0xc/0x118
 [13440.817439]  [c022af6e] run_timer_softirq+0x37/0x16b
 [13440.822673]  [c04a1c0f] dst_run_gc+0x0/0x118
 [13440.827221]  [c04a3eab] neigh_destroy+0xbe/0x104
 [13440.832114]  [c04a1bb1] dst_destroy+0x4d/0xab
 [13440.836751]  [c04a1c64] dst_run_gc+0x55/0x118
 [13440.841384]  [c022b03f] run_timer_softirq+0x108/0x16b
 [13440.846711]  [c0227634] __do_softirq+0x5a/0xd5
 [13440.851427]  [c023b435] trace_hardirqs_on+0x106/0x141
 [13440.856754]  [c0227643] __do_softirq+0x69/0xd5
 [13440.861470]  [c02276e6] do_softirq+0x37/0x4d
 [13440.866016]  [c02167b0] smp_apic_timer_interrupt+0x6b/0x77
 [13440.871774]  [c02029ef] default_idle+0x3b/0x54
 [13440.876491]  [c02029ef] default_idle+0x3b/0x54
 [13440.881211]  [c0204c33] apic_timer_interrupt+0x33/0x38
 [13440.886624]  [c02029ef] default_idle+0x3b/0x54
 [13440.891342]  [c02029f1] default_idle+0x3d/0x54
 [13440.896061]  [c0202aaa] cpu_idle+0xa2/0xbb
 [13440.900436]  ===
 [13768.711447] BUG: spinlock lockup on CPU#1, swapper/0, c0687880
 [13768.717353]  [c031f919] _raw_spin_lock+0xda/0xfd
 [13768.722247]  [c051ad48] _spin_lock_irqsave+0x39/0x41
 [13768.727486]  [f899bff2] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib]
 [13768.734110]  [f899bff2] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib]
 [13768.740735]  [c04a1c1b] dst_run_gc+0xc/0x118
 [13768.745276]  [c022af6e] run_timer_softirq+0x37/0x16b
 [13768.750517]  [c04a1c0f] dst_run_gc+0x0/0x118
 [13768.755061]  [c04a3eab] neigh_destroy+0xbe/0x104
 [13768.759955]  [c04a1bb1] dst_destroy+0x4d/0xab
 [13768.764586]  [c04a1c64] dst_run_gc+0x55/0x118
 [13768.769218]  [c022b03f] run_timer_softirq+0x108/0x16b
 [13768.774542]  [c0227634] __do_softirq+0x5a/0xd5
 [13768.779261]  [c023b435] trace_hardirqs_on+0x106/0x141
 [13768.784588]  [c0227643] __do_softirq+0x69/0xd5
 [13768.789308]  [c02276e6] do_softirq+0x37/0x4d
 [13768.793851]  [c02167b0] smp_apic_timer_interrupt+0x6b/0x77
 [13768.799609]  [c02029ef] default_idle+0x3b/0x54
 [13768.804326]  [c02029ef] default_idle+0x3b/0x54
 [13768.809054]  [c0204c33] apic_timer_interrupt+0x33/0x38
 [13768.814471]  [c02029ef] default_idle+0x3b/0x54
 [13768.819187]  [c02029f1] default_idle+0x3d/0x54
 [13768.823903]  [c0202aaa] cpu_idle+0xa2/0xbb
 [13768.828279]  ===
 
 
 -- 
 MST
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please 

Re: lockdep question (was Re: IPoIB caused a kernel: BUG: soft lockup detected on CPU#0!)

2007-03-11 Thread Michael S. Tsirkin

After adding some printks, I started getting these:

[  597.036720] BUG: MAX_STACK_TRACE_ENTRIES too low!
[  597.041546] turning off the locking correctness validator.
[  597.047135]  [c023a922] save_trace+0x8a/0x8f
[  597.051751]  [c023ae8c] mark_lock+0x65/0x3ff
[  597.056366]  [c023a8d6] save_trace+0x3e/0x8f
[  597.060980]  [c023a9f0] add_lock_to_list+0x62/0x85
[  597.066116]  [c023b992] __lock_acquire+0x3f4/0xbb9
[  597.071252]  [f89da11f] send_mad+0x79/0x103 [ib_sa]
[  597.076474]  [c031a475] idr_get_new_above_int+0x13c/0x216
[  597.082225]  [c023c43d] lock_acquire+0x56/0x71
[  597.087018]  [f89da11f] send_mad+0x79/0x103 [ib_sa]
[  597.092240]  [c051ad41] _spin_lock_irqsave+0x32/0x41
[  597.097547]  [f89da11f] send_mad+0x79/0x103 [ib_sa]
[  597.102770]  [f89da11f] send_mad+0x79/0x103 [ib_sa]
[  597.107989]  [f89da8d9] ib_sa_path_rec_get+0x134/0x172 [ib_sa]
[  597.114166]  [f899b73f] path_rec_start+0x115/0x143 [ib_ipoib]
[  597.120254]  [f899cb38] path_rec_completion+0x0/0x4f4 [ib_ipoib]
[  597.126610]  [f899b874] path_rec_create+0x77/0x9d [ib_ipoib]
[  597.132617]  [f899c9fe] ipoib_start_xmit+0x441/0x57b [ib_ipoib]
[  597.13]  [c051ae06] _spin_unlock_irqrestore+0x34/0x39
[  597.144635]  [c023b435] trace_hardirqs_on+0x106/0x141
[  597.150035]  [c04a058b] dev_queue_xmit+0x109/0x245
[  597.155167]  [c022ae27] __mod_timer+0x94/0x9e
[  597.159871]  [c04a0423] dev_hard_start_xmit+0x1be/0x21d
[  597.165438]  [c04a9fa9] __qdisc_run+0xd7/0x190
[  597.170226]  [c04a05b7] dev_queue_xmit+0x135/0x245
[  597.175360]  [c04ce267] arp_process+0x2c0/0x512
[  597.180234]  [f8954346] mthca_tavor_interrupt+0xf3/0x12b [ib_mthca]
[  597.186855]  [c04a088b] netif_receive_skb+0x1c4/0x1da
[  597.192254]  [c023b435] trace_hardirqs_on+0x106/0x141
[  597.197648]  [c04a0935] process_backlog+0x94/0x107
[  597.202785]  [c049f02b] net_rx_action+0x9a/0x15e
[  597.207743]  [c0227643] __do_softirq+0x69/0xd5
[  597.212530]  [c02276e6] do_softirq+0x37/0x4d
[  597.217147]  [c020617e] do_IRQ+0x5c/0x72
[  597.221415]  [c0204b52] common_interrupt+0x2e/0x34
[  597.226549]  [c02029ef] default_idle+0x3b/0x54
[  597.231337]  [c02029f1] default_idle+0x3d/0x54
[  597.236124]  [c0202aaa] cpu_idle+0xa2/0xbb
[  597.240567]  ===

And sometimes these:

[  404.493572] KERNEL: assertion (!timer_pending(dev-watchdog_timer)) failed 
at net/sched/sch_generic.c (608)

-- 
MST
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: lockdep question (was Re: IPoIB caused a kernel: BUG: soft lockup detected on CPU#0!)

2007-03-11 Thread Ingo Molnar

* Michael S. Tsirkin [EMAIL PROTECTED] wrote:

 After adding some printks, I started getting these:
 
 [  597.036720] BUG: MAX_STACK_TRACE_ENTRIES too low!

this should go away if you double the size of MAX_STACK_TRACE_ENTRIES in 
kernel/lockdep_internals.h. (keep it a power of two) If it doesnt go 
away then it might signal some sort of leak.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/