On 01/16/2015 02:22 AM, Preeti U Murthy wrote: > Hi Alexey, > > Can you let me know if the following patch fixes the issue for you ? > It did for us on one of our machines that we were investigating on.
This fixes the issue for me as well, thanks! Tested-by: Alexey Kardashevskiy <a...@ozlabs.ru> > Anton, > Can you let me know if the patch fixes the odd perf report that you observed? > This is the latest fix that there is to it. > > -------------------------------START > PATCH-------------------------------------- > > tick/broadcast-hrtimer : Make movement of broadcast hrtimer robust against > cpu offline > > From: Preeti U Murthy <pre...@linux.vnet.ibm.com> > > When a cpu on which the broadcast hrtimer is queued goes offline, the hrtimer > needs to be moved to another cpu. There maybe potential race conditions here, > which can lead to the hrtimer not being queued on any cpu. > > There was a report on softlockups, with cpus being stuck in deep idle states, > when smt mode switching was being done, especially on systems with smaller > number of > cpus [1]. This is hard to reproduce on a large machine because the chances > that an > offline cpu was the stand by cpu handling broadcast become lesser. Given the > current code, the only situation that looks capable of triggering scenarios > where > broadcast IPIs are not delivered, is in the cpu offline path. I am > at a loss to pin point the precise scenario. > > Therefore to avoid any possible race condition between cpu offline and > movement of the broadcast hrtimer, this patch ensures that the act of keeping > track of broadcasting wakeup IPIs is started afresh after a cpu offline > operation. > This is done by reseting the expiry time of the hrtimer to a max value. This > has to be done in the CPU_DYING phase so that it is visible to all cpus right > after exiting stop_machine. > > The rationale is that during cpu offline, since all cpus are woken up anyway > to run stop_machine, we are only required to keep track of broadcasting to > cpus > that henceforth enter deep idle states. This ensures that the broadcast > hrtimer > gets positively queued on a cpu as long as there are cpus in deep idle states. > > It has to be noted that this code is in addition to retaining the logic that > we > have today in the broadcast code on an offline operation. If this logic fails > to > move the broadcast hrtimer due to a race condition we have the following > patch to > handle it right. > > [1]http://linuxppc.10917.n7.nabble.com/offlining-cpus-breakage-td88619.html > There is no issue in programming the decrementer as was presumed and stated in > this link. > > Signed-off-by: Preeti U Murthy <pre...@linux.vnet.ibm.com> > --- > kernel/time/clockevents.c | 2 +- > kernel/time/tick-broadcast.c | 1 + > 2 files changed, 2 insertions(+), 1 deletion(-) > > diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c > index 5544990..f3907c9 100644 > --- a/kernel/time/clockevents.c > +++ b/kernel/time/clockevents.c > @@ -568,6 +568,7 @@ int clockevents_notify(unsigned long reason, void *arg) > > case CLOCK_EVT_NOTIFY_CPU_DYING: > tick_handover_do_timer(arg); > + tick_shutdown_broadcast_oneshot(arg); > break; > > case CLOCK_EVT_NOTIFY_SUSPEND: > @@ -580,7 +581,6 @@ int clockevents_notify(unsigned long reason, void *arg) > break; > > case CLOCK_EVT_NOTIFY_CPU_DEAD: > - tick_shutdown_broadcast_oneshot(arg); > tick_shutdown_broadcast(arg); > tick_shutdown(arg); > /* > diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c > index 066f0ec..1f5eda6 100644 > --- a/kernel/time/tick-broadcast.c > +++ b/kernel/time/tick-broadcast.c > @@ -677,6 +677,7 @@ static void broadcast_move_bc(int deadcpu) > return; > /* This moves the broadcast assignment to this cpu */ > clockevents_program_event(bc, bc->next_event, 1); > + bc->next_event.tv64 = KTIME_MAX; > } > > /* > > ---------------------------------------END PATCH------------------------- > Thanks > > Regards > Preeti U Murthy > > On 01/07/2015 03:07 PM, Alexey Kardashevskiy wrote: >> Hi! >> >> "ppc64_cpu --smt=off" produces multiple error on the latest upstream kernel >> (sha1 bdec419): >> >> NMI watchdog: BUG: soft lockup - CPU#20 stuck for 23s! [swapper/20:0] >> >> or >> >> INFO: rcu_sched detected stalls on CPUs/tasks: { 2 7 8 9 10 11 12 13 14 15 >> 16 17 18 19 20 21 22 23 2 >> 4 25 26 27 28 29 30 31} (detected by 6, t=2102 jiffies, g=1617, c=1616, >> q=1441) >> >> and many others, all about lockups >> >> I did bisecting and found out that reverting these helps: >> >> 77b54e9f213f76a23736940cf94bcd765fc00f40 powernv/powerpc: Add winkle >> support for offline cpus >> 7cba160ad789a3ad7e68b92bf20eaad6ed171f80 powernv/cpuidle: Redesign idle >> states management >> 8eb8ac89a364305d05ad16be983b7890eb462cc3 powerpc/powernv: Enable Offline >> CPUs to enter deep idle states >> >> btw reverting just two of them produces a compile error. >> >> It is pseries_le_defconfig, POWER8 machine: >> timebase : 512000000 >> platform : PowerNV >> model : palmetto >> machine : PowerNV palmetto >> firmware : OPAL v3 >> >> >> Please help to fix it. Thanks. >> >> > -- Alexey _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev