Nathan Lynch <nath...@linux.ibm.com> writes: > I'm hoping for some help investigating a behavior I see when doing cpu > hotplug under load on P9 and P8 LPARs. Occasionally, while coming online > a cpu will seem to get "stuck" in idle, with a pending doorbell > interrupt unserviced (cpu 12 here): > > cpuhp/12-70 [012] 46133.602202: cpuhp_enter: cpu: 0012 target: > 205 step: 174 (0xc000000000028920s) > load.sh-8201 [014] 46133.602248: sched_waking: comm=cpuhp/12 pid=70 > prio=120 target_cpu=012 > load.sh-8201 [014] 46133.602251: smp_send_reschedule: (c000000000052868) > cpu=12 > <idle>-0 [012] 46133.602252: do_idle: (c000000000162e08) > load.sh-8201 [014] 46133.602252: smp_muxed_ipi_message_pass: > (c0000000000527e8) cpu=12 msg=1 > load.sh-8201 [014] 46133.602253: doorbell_core_ipi: (c00000000004d3e8) > cpu=12 > <idle>-0 [012] 46133.602257: arch_cpu_idle: (c000000000022d08) > <idle>-0 [012] 46133.602259: pseries_lpar_idle: (c0000000000d43c8)
I should be more explicit that given my tracing configuration I would expect to see doorbell events etc here e.g. <idle>-0 [012] 46133.602086: doorbell_entry: pt_regs=0xc000000200e7fb50 <idle>-0 [012] 46133.602087: smp_ipi_demux_relaxed: (c0000000000530f8) <idle>-0 [012] 46133.602088: scheduler_ipi: (c00000000015e4f8) <idle>-0 [012] 46133.602091: sched_wakeup: cpuhp/12:70 [120] success=1 CPU:012 <idle>-0 [012] 46133.602092: sched_wakeup: migration/12:71 [0] success=1 CPU:012 <idle>-0 [012] 46133.602093: doorbell_exit: pt_regs=0xc000000200e7fb50 but instead cpu 12 goes to idle.