Hi Thomas, On Wed, Mar 06, 2013 at 11:18:35AM +0000, Thomas Gleixner wrote: > Some brilliant hardware implementations wake multiple cores when the > broadcast timer fires. This leads to the following interesting > problem: > > CPU0 CPU1 > wakeup from idle wakeup from idle > > leave broadcast mode leave broadcast mode > restart per cpu timer restart per cpu timer > go back to idle > handle broadcast > (empty mask) > enter broadcast mode > programm broadcast device > enter broadcast mode > programm broadcast device > > So what happens is that due to the forced reprogramming of the cpu > local timer, we need to set a event in the future. Now if we manage to > go back to idle before the timer fires, we switch off the timer and > arm the broadcast device with an already expired time (covered by > forced mode). So in the worst case we repeat the above ping pong > forever. > > Unfortunately we have no information about what caused the wakeup, but > we can check current time against the expiry time of the local cpu. If > the local event is already in the past, we know that the broadcast > timer is about to fire and send an IPI. So we mark ourself as an IPI > target even if we left broadcast mode and avoid the reprogramming of > the local cpu timer. > > This still leaves the possibility that a CPU which is not handling the > broadcast interrupt is going to reach idle again before the IPI > arrives. This can't be solved in the core code and will be handled in > follow up patches. > > Reported-by: Jason Liu <liu.h.ja...@gmail.com> > Signed-off-by: Thomas Gleixner <t...@linutronix.de> > --- > kernel/time/tick-broadcast.c | 59 > ++++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 58 insertions(+), 1 deletion(-) > > Index: tip/kernel/time/tick-broadcast.c > =================================================================== > --- tip.orig/kernel/time/tick-broadcast.c > +++ tip/kernel/time/tick-broadcast.c > @@ -393,6 +393,7 @@ int tick_resume_broadcast(void) > > static cpumask_var_t tick_broadcast_oneshot_mask; > static cpumask_var_t tick_broadcast_pending_mask; > +static cpumask_var_t tick_broadcast_force_mask; > > /* > * Exposed for debugging: see timer_list.c > @@ -462,6 +463,10 @@ again: > } > } > > + /* Take care of enforced broadcast requests */ > + cpumask_or(tmpmask, tmpmask, tick_broadcast_force_mask); > + cpumask_clear(tick_broadcast_force_mask);
I tested the set and it works fine on a dual cluster big.LITTLE testchip using broadcast timer to manage deep idle cluster states. Just asking a question: the force mask is cleared before sending the timer IPI. Would not be better to clear it after the IPI is sent in tick_do_broadcast(...) ? Can you spot a regression if we do this ? The idle thread checks that mask with irqs disabled, so it is possible that we clear the mask before the CPU has a chance to get the IPI. If we clear the mask after sending the IPI, we are increasing the chances for the idle thread to get it. It is just a further optimization, just asking, thanks. > + > /* > * Wakeup the cpus which have an expired event. > */ > @@ -497,6 +502,7 @@ void tick_broadcast_oneshot_control(unsi > struct clock_event_device *bc, *dev; > struct tick_device *td; > unsigned long flags; > + ktime_t now; > int cpu; > > /* > @@ -524,7 +530,16 @@ void tick_broadcast_oneshot_control(unsi > WARN_ON_ONCE(cpumask_test_cpu(cpu, > tick_broadcast_pending_mask)); > if (!cpumask_test_and_set_cpu(cpu, > tick_broadcast_oneshot_mask)) { > clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN); > - if (dev->next_event.tv64 < bc->next_event.tv64) > + /* > + * We only reprogram the broadcast timer if we > + * did not mark ourself in the force mask and > + * if the cpu local event is earlier than the > + * broadcast event. If the current CPU is in > + * the force mask, then we are going to be > + * woken by the IPI right away. > + */ > + if (!cpumask_test_cpu(cpu, tick_broadcast_force_mask) && Is the test against tick_broadcast_force_mask necessary if we add the check in the idle thread before entering idle ? It does not hurt, agreed, and we'd better leave it there, it is just for my own understanding, thanks a lot. Having said that, on the series: Tested-by: Lorenzo Pieralisi <lorenzo.pieral...@arm.com> > + dev->next_event.tv64 < bc->next_event.tv64) > tick_broadcast_set_event(dev->next_event, 1); > } > } else { > @@ -545,6 +560,47 @@ void tick_broadcast_oneshot_control(unsi > tick_broadcast_pending_mask)) > goto out; > > + /* > + * If the pending bit is not set, then we are > + * either the CPU handling the broadcast > + * interrupt or we got woken by something else. > + * > + * We are not longer in the broadcast mask, so > + * if the cpu local expiry time is already > + * reached, we would reprogram the cpu local > + * timer with an already expired event. > + * > + * This can lead to a ping-pong when we return > + * to idle and therefor rearm the broadcast > + * timer before the cpu local timer was able > + * to fire. This happens because the forced > + * reprogramming makes sure that the event > + * will happen in the future and depending on > + * the min_delta setting this might be far > + * enough out that the ping-pong starts. > + * > + * If the cpu local next_event has expired > + * then we know that the broadcast timer > + * next_event has expired as well and > + * broadcast is about to be handled. So we > + * avoid reprogramming and enforce that the > + * broadcast handler, which did not run yet, > + * will invoke the cpu local handler. > + * > + * We cannot call the handler directly from > + * here, because we might be in a NOHZ phase > + * and we did not go through the irq_enter() > + * nohz fixups. > + */ > + now = ktime_get(); > + if (dev->next_event.tv64 <= now.tv64) { > + cpumask_set_cpu(cpu, tick_broadcast_force_mask); > + goto out; > + } > + /* > + * We got woken by something else. Reprogram > + * the cpu local timer device. > + */ > tick_program_event(dev->next_event, 1); > } > } > @@ -686,5 +742,6 @@ void tick_broadcast_init(void) > #ifdef CONFIG_TICK_ONESHOT > alloc_cpumask_var(&tick_broadcast_oneshot_mask, GFP_NOWAIT); > alloc_cpumask_var(&tick_broadcast_pending_mask, GFP_NOWAIT); > + alloc_cpumask_var(&tick_broadcast_force_mask, GFP_NOWAIT); > #endif > } > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/