Hi, Feng Guang

On 09/05/2012 09:11 AM, Fengguang Wu wrote:
> Hi,
> 
> Here is an old problem that happens also in 3.4. It's very unreliable:
> it may only happen once per 3000 boots..
> 
> [   10.968565] reboot: machine restart
> [   10.983510] ------------[ cut here ]------------
> [   10.984218] WARNING: at 
> /c/kernel-tests/src/stable/arch/x86/kernel/smp.c:123 
> native_smp_send_reschedule+0x46/0x50()
> [   10.985880] Pid: 88, comm: kpktgend_0 Not tainted 3.6.0-rc3-00005-gb374aa1 
> #10
> [   10.987185] Call Trace:
> [   10.987506]  [<7902f42a>] warn_slowpath_common+0x5a/0x80
> [   10.987506]  [<7901ee16>] ? native_smp_send_reschedule+0x46/0x50
> [   10.987506]  [<7901ee16>] ? native_smp_send_reschedule+0x46/0x50
> [   10.987506]  [<7902f4fd>] warn_slowpath_null+0x1d/0x20
> [   10.987506]  [<7901ee16>] native_smp_send_reschedule+0x46/0x50

So this cpu try to fire a nohz balance kick ipi to an offline cpu?

May be we are choosing a wrong cpu to kick but that's not the point,
what I can't understand is why this cpu could do this kick.

We have nohz_kick_needed() to check whether current cpu should do kick ,
and the first condition we need to match is that current cpu should be
idle, but the trace show current pid is 88 not 0.

We should add Peter to cc list, may be he will be interested on what
happened.

Regards,
Michael Wang

> [   10.987506]  [<7905fdad>] trigger_load_balance+0x1bd/0x250
> [   10.987506]  [<79056d14>] scheduler_tick+0xd4/0x100
> [   10.987506]  [<7903bde5>] update_process_times+0x55/0x70
> [   10.987506]  [<79071187>] tick_sched_timer+0x57/0xb0
> [   10.987506]  [<793accee>] ? do_raw_spin_unlock+0x4e/0x90
> [   10.987506]  [<7904e0b7>] __run_hrtimer.isra.29+0x57/0x100
> [   10.987506]  [<79071130>] ? tick_nohz_handler+0xe0/0xe0
> [   10.987506]  [<7904ed45>] hrtimer_interrupt+0xe5/0x280
> [   10.987506]  [<7905a5a7>] ? sched_clock_cpu+0xc7/0x150
> [   10.987506]  [<7901f9a4>] smp_apic_timer_interrupt+0x54/0x90
> [   10.987506]  [<79882631>] apic_timer_interrupt+0x31/0x40
> [   10.987506]  [<7905007b>] ? call_srcu+0x2b/0x70
> [   10.987506]  [<793a00e0>] ? __bitmap_intersects+0x10/0x80
> [   10.987506]  [<7988194f>] ? _raw_spin_unlock_irq+0x1f/0x40
> [   10.987506]  [<7905307f>] finish_task_switch+0x7f/0xd0
> [   10.987506]  [<79053038>] ? finish_task_switch+0x38/0xd0
> [   10.987506]  [<7988044a>] __schedule+0x38a/0x770
> [   10.987506]  [<7905a5a7>] ? sched_clock_cpu+0xc7/0x150
> [   10.987506]  [<7987ea40>] ? schedule_timeout+0x100/0x1b0
> [   10.987506]  [<793accee>] ? do_raw_spin_unlock+0x4e/0x90
> [   10.987506]  [<7988084e>] schedule+0x1e/0x50
> [   10.987506]  [<7987ea45>] schedule_timeout+0x105/0x1b0
> [   10.987506]  [<7903adb0>] ? __internal_add_timer+0xb0/0xb0
> [   10.987506]  [<796842f2>] pktgen_thread_worker+0x1342/0x1390
> [   10.987506]  [<7988044a>] ? __schedule+0x38a/0x770
> [   10.987506]  [<793accee>] ? do_raw_spin_unlock+0x4e/0x90
> [   10.987506]  [<793accee>] ? do_raw_spin_unlock+0x4e/0x90
> [   10.987506]  [<7904aa40>] ? abort_exclusive_wait+0x80/0x80
> [   10.987506]  [<7904aa40>] ? abort_exclusive_wait+0x80/0x80
> [   10.987506]  [<79682fb0>] ? pktgen_if_write+0x2210/0x2210
> [   10.987506]  [<79049ff8>] kthread+0x78/0x80
> [   10.987506]  [<79880000>] ? __up.isra.0+0xd/0x2d
> [   10.987506]  [<79049f80>] ? insert_kthread_work+0x70/0x70
> [   10.987506]  [<798830c6>] kernel_thread_helper+0x6/0xd
> 
> Here are all the oops messages I collected in the past days:
> 
> [    4.815145] Restarting system.
> [    4.815644] reboot: machine restart
> [    4.824591] ------------[ cut here ]------------
> [    4.825423] WARNING: at /c/wfg/linux/arch/x86/kernel/smp.c:123 
> native_smp_send_reschedule+0x46/0x50()
> [    4.826881] Pid: 18, comm: kworker/0:1 Not tainted 
> 3.6.0-rc3-bisect-00007-g6320675 #25
> [    4.828116] Call Trace:
> [    4.828533]  [<7902f42a>] warn_slowpath_common+0x5a/0x80
> [    4.828585]  [<7901ee16>] ? native_smp_send_reschedule+0x46/0x50
> [    4.828585]  [<7901ee16>] ? native_smp_send_reschedule+0x46/0x50
> [    4.828585]  [<7902f4fd>] warn_slowpath_null+0x1d/0x20
> [    4.828585]  [<7901ee16>] native_smp_send_reschedule+0x46/0x50
> [    4.828585]  [<7905fdad>] trigger_load_balance+0x1bd/0x250
> [    4.828585]  [<79056d14>] scheduler_tick+0xd4/0x100
> [    4.828585]  [<7903bde5>] update_process_times+0x55/0x70
> [    4.828585]  [<79071187>] tick_sched_timer+0x57/0xb0
> [    4.828585]  [<793accee>] ? do_raw_spin_unlock+0x4e/0x90
> [    4.828585]  [<7904e0b7>] __run_hrtimer.isra.29+0x57/0x100
> [    4.828585]  [<79071130>] ? tick_nohz_handler+0xe0/0xe0
> [    4.828585]  [<7904ed45>] hrtimer_interrupt+0xe5/0x280
> [    4.828585]  [<7905a5a7>] ? sched_clock_cpu+0xc7/0x150
> [    4.828585]  [<7901f9a4>] smp_apic_timer_interrupt+0x54/0x90
> [    4.828585]  [<79882401>] apic_timer_interrupt+0x31/0x40
> [    4.828585]  [<7905007b>] ? call_srcu+0x2b/0x70
> [    4.828585]  [<793a00e0>] ? __bitmap_intersects+0x10/0x80
> [    4.828585]  [<7988171f>] ? _raw_spin_unlock_irq+0x1f/0x40
> [    4.828585]  [<7905307f>] finish_task_switch+0x7f/0xd0
> [    4.828585]  [<79053038>] ? finish_task_switch+0x38/0xd0
> [    4.828585]  [<7988021a>] __schedule+0x38a/0x770
> [    4.828585]  [<79045529>] ? worker_thread+0x1a9/0x380
> [    4.828585]  [<793accee>] ? do_raw_spin_unlock+0x4e/0x90
> [    4.828585]  [<7988061e>] schedule+0x1e/0x50
> [    4.828585]  [<7904552e>] worker_thread+0x1ae/0x380
> [    4.828585]  [<79056ed9>] ? complete+0x49/0x60
> [    4.828585]  [<79045380>] ? manage_workers.isra.23+0x250/0x250
> [    4.828585]  [<79049ff8>] kthread+0x78/0x80
> [    4.828585]  [<79880000>] ? __schedule+0x170/0x770
> [    4.828585]  [<79049f80>] ? insert_kthread_work+0x70/0x70
> [    4.828585]  [<79882e86>] kernel_thread_helper+0x6/0xd
> [    4.828585] ---[ end trace ae11d0bb45464df2 ]---
> --
> [    9.195595] Restarting system.
> [    9.196142] reboot: machine restart
> [    9.228474] ------------[ cut here ]------------
> [    9.229329] WARNING: at /c/wfg/linux/arch/x86/kernel/smp.c:123 
> native_smp_send_reschedule+0x46/0x50()
> [    9.230890] Pid: 88, comm: kpktgend_0 Not tainted 3.6.0-rc3-bisect #28
> [    9.232034] Call Trace:
> [    9.232468]  [<7902f42a>] warn_slowpath_common+0x5a/0x80
> [    9.232468]  [<7901ee16>] ? native_smp_send_reschedule+0x46/0x50
> [    9.232468]  [<7901ee16>] ? native_smp_send_reschedule+0x46/0x50
> [    9.232468]  [<7902f4fd>] warn_slowpath_null+0x1d/0x20
> [    9.232468]  [<7901ee16>] native_smp_send_reschedule+0x46/0x50
> [    9.232468]  [<7905fdad>] trigger_load_balance+0x1bd/0x250
> [    9.232468]  [<79056d14>] scheduler_tick+0xd4/0x100
> [    9.232468]  [<7903bde5>] update_process_times+0x55/0x70
> [    9.232468]  [<79071187>] tick_sched_timer+0x57/0xb0
> [    9.232468]  [<793accee>] ? do_raw_spin_unlock+0x4e/0x90
> [    9.232468]  [<7904e0b7>] __run_hrtimer.isra.29+0x57/0x100
> [    9.232468]  [<79071130>] ? tick_nohz_handler+0xe0/0xe0
> [    9.232468]  [<7904ed45>] hrtimer_interrupt+0xe5/0x280
> [    9.232468]  [<7905a5a7>] ? sched_clock_cpu+0xc7/0x150
> [    9.232468]  [<7901f9a4>] smp_apic_timer_interrupt+0x54/0x90
> [    9.232468]  [<79882401>] apic_timer_interrupt+0x31/0x40
> [    9.232468]  [<7905007b>] ? call_srcu+0x2b/0x70
> [    9.232468]  [<793a00e0>] ? __bitmap_intersects+0x10/0x80
> [    9.232468]  [<7988171f>] ? _raw_spin_unlock_irq+0x1f/0x40
> [    9.232468]  [<7905307f>] finish_task_switch+0x7f/0xd0
> [    9.232468]  [<79053038>] ? finish_task_switch+0x38/0xd0
> [    9.232468]  [<7988021a>] __schedule+0x38a/0x770
> [    9.232468]  [<7905a5a7>] ? sched_clock_cpu+0xc7/0x150
> [    9.232468]  [<7987e810>] ? schedule_timeout+0x100/0x1b0
> [    9.232468]  [<793accee>] ? do_raw_spin_unlock+0x4e/0x90
> [    9.232468]  [<7988061e>] schedule+0x1e/0x50
> [    9.232468]  [<7987e815>] schedule_timeout+0x105/0x1b0
> [    9.232468]  [<7903adb0>] ? __internal_add_timer+0xb0/0xb0
> [    9.232468]  [<796840c2>] pktgen_thread_worker+0x1342/0x1390
> [    9.232468]  [<7988021a>] ? __schedule+0x38a/0x770
> [    9.232468]  [<793accee>] ? do_raw_spin_unlock+0x4e/0x90
> [    9.232468]  [<7904aa40>] ? abort_exclusive_wait+0x80/0x80
> [    9.232468]  [<7904aa40>] ? abort_exclusive_wait+0x80/0x80
> [    9.232468]  [<79682d80>] ? pktgen_if_write+0x2210/0x2210
> [    9.232468]  [<79049ff8>] kthread+0x78/0x80
> [    9.232468]  [<79880000>] ? __schedule+0x170/0x770
> [    9.232468]  [<79049f80>] ? insert_kthread_work+0x70/0x70
> [    9.232468]  [<79882e86>] kernel_thread_helper+0x6/0xd
> [    9.232468] ---[ end trace 4b6a1d80f6ec28ef ]---
> --
> [    4.665800] Restarting system.
> [    4.666537] machine restart
> [    4.682180] ------------[ cut here ]------------
> [    4.683044] WARNING: at /c/wfg/linux/arch/x86/kernel/smp.c:120 
> native_smp_send_reschedule+0x46/0x50()
> [    4.684515] Pid: 115, comm: reboot Not tainted 3.4.0-bisect #30
> [    4.685459] Call Trace:
> [    4.685919]  [<7984acc5>] ? printk+0x18/0x1a
> [    4.686173]  [<7902ab6a>] warn_slowpath_common+0x5a/0x80
> [    4.686173]  [<7901ac46>] ? native_smp_send_reschedule+0x46/0x50
> [    4.686173]  [<7901ac46>] ? native_smp_send_reschedule+0x46/0x50
> [    4.686173]  [<7902ac3d>] warn_slowpath_null+0x1d/0x20
> [    4.686173]  [<7901ac46>] native_smp_send_reschedule+0x46/0x50
> [    4.686173]  [<79058885>] trigger_load_balance+0x295/0x340
> [    4.686173]  [<7904f43f>] scheduler_tick+0xcf/0x100
> [    4.686173]  [<79035985>] update_process_times+0x55/0x70
> [    4.686173]  [<7906979b>] tick_sched_timer+0x5b/0xc0
> [    4.686173]  [<7939794e>] ? do_raw_spin_unlock+0x4e/0x90
> [    4.686173]  [<79047167>] __run_hrtimer.isra.29+0x57/0x100
> [    4.686173]  [<79069740>] ? tick_nohz_handler+0xe0/0xe0
> [    4.686173]  [<79047dd5>] hrtimer_interrupt+0xd5/0x240
> [    4.686173]  [<7901b844>] smp_apic_timer_interrupt+0x54/0x90
> [    4.686173]  [<79858e31>] apic_timer_interrupt+0x31/0x40
> [    4.686173]  [<793877be>] ? delay_tsc+0x3e/0xa0
> [    4.686173]  [<7938785b>] __const_udelay+0x1b/0x20
> [    4.686173]  [<7901adff>] native_nmi_stop_other_cpus+0xaf/0xe0
> [    4.686173]  [<7901a237>] native_machine_shutdown+0x57/0x80
> [    4.686173]  [<79021c79>] kvm_shutdown+0x19/0x20
> [    4.686173]  [<7901a1c2>] native_machine_restart+0x22/0x40
> [    4.686173]  [<7901a3f9>] machine_restart+0x9/0x10
> [    4.686173]  [<7903a643>] kernel_restart+0x33/0x50
> [    4.686173]  [<7903a86d>] sys_reboot+0x13d/0x1e0
> [    4.686173]  [<79052cd7>] ? sched_clock_cpu+0xc7/0x150
> [    4.686173]  [<7902409f>] ? do_page_fault+0x19f/0x3e0
> [    4.686173]  [<79021b60>] ? kvm_read_and_reset_pf_reason+0x40/0x40
> [    4.686173]  [<79048466>] ? up_read+0x16/0x30
> [    4.686173]  [<7902409f>] ? do_page_fault+0x19f/0x3e0
> [    4.686173]  [<790c805b>] ? iterate_supers+0xab/0xd0
> [    4.686173]  [<7939794e>] ? do_raw_spin_unlock+0x4e/0x90
> [    4.686173]  [<79857eed>] ? _raw_spin_unlock+0x1d/0x30
> [    4.686173]  [<790c805b>] ? iterate_supers+0xab/0xd0
> [    4.686173]  [<790e9bd0>] ? __sync_filesystem+0xa0/0xa0
> [    4.686173]  [<79859327>] sysenter_do_call+0x12/0x37
> [    4.686173] ---[ end trace 9d854ed5f523a273 ]---
> --
> [    5.815913] Restarting system.
> [    5.816485] machine restart
> [    5.824887] ------------[ cut here ]------------
> [    5.825679] WARNING: at /c/wfg/linux/arch/x86/kernel/smp.c:123 
> native_smp_send_reschedule+0x46/0x50()
> [    5.827263] Pid: 16, comm: kworker/0:1 Not tainted 3.5.0-bisect #29
> [    5.828336] Call Trace:
> [    5.828793]  [<7902bb2a>] warn_slowpath_common+0x5a/0x80
> [    5.828882]  [<7901ba16>] ? native_smp_send_reschedule+0x46/0x50
> [    5.828882]  [<7901ba16>] ? native_smp_send_reschedule+0x46/0x50
> [    5.828882]  [<7902bbfd>] warn_slowpath_null+0x1d/0x20
> [    5.828882]  [<7901ba16>] native_smp_send_reschedule+0x46/0x50
> [    5.828882]  [<7905bc0d>] trigger_load_balance+0x1bd/0x250
> [    5.828882]  [<790529e4>] scheduler_tick+0xd4/0x100
> [    5.828882]  [<79038245>] update_process_times+0x55/0x70
> [    5.828882]  [<7906cc4b>] tick_sched_timer+0x5b/0xc0
> [    5.828882]  [<7939fcce>] ? do_raw_spin_unlock+0x4e/0x90
> [    5.828882]  [<79049f37>] __run_hrtimer.isra.29+0x57/0x100
> [    5.828882]  [<7906cbf0>] ? tick_nohz_handler+0xe0/0xe0
> [    5.828882]  [<7904abc5>] hrtimer_interrupt+0xe5/0x280
> [    5.828882]  [<7901c5a4>] smp_apic_timer_interrupt+0x54/0x90
> [    5.828882]  [<79870351>] apic_timer_interrupt+0x31/0x40
> [    5.828882]  [<7904007b>] ? start_worker+0x2b/0x30
> [    5.828882]  [<793900e0>] ? insn_get_prefixes+0x150/0x240
> [    5.828882]  [<7986f46f>] ? _raw_spin_unlock_irq+0x1f/0x40
> [    5.828882]  [<7904edcf>] finish_task_switch+0x7f/0xd0
> [    5.828882]  [<7904ed88>] ? finish_task_switch+0x38/0xd0
> [    5.828882]  [<7986df6a>] __schedule+0x38a/0x770
> [    5.828882]  [<790415fc>] ? worker_thread+0x16c/0x270
> [    5.828882]  [<7939fcce>] ? do_raw_spin_unlock+0x4e/0x90
> [    5.828882]  [<7986e36e>] schedule+0x1e/0x50
> [    5.828882]  [<79041601>] worker_thread+0x171/0x270
> [    5.828882]  [<79052ba9>] ? complete+0x49/0x60
> [    5.828882]  [<79041490>] ? manage_workers.isra.26+0x1c0/0x1c0
> [    5.828882]  [<79045f18>] kthread+0x78/0x80
> [    5.828882]  [<79860000>] ? ip_vs_control_net_cleanup+0xc0/0xd0
> [    5.828882]  [<79045ea0>] ? kthread_flush_work_fn+0x10/0x10
> [    5.828882]  [<79870dc6>] kernel_thread_helper+0x6/0xd
> [    5.828882] ---[ end trace df77608ce627a14d ]---
> 
> Thanks,
> Fengguang
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to