On Fri, 2017-10-06 at 11:40 +0530, Nikunj A Dadhania wrote: > Cédric Le Goater <c...@kaod.org> writes: > > > Hello, > > > > When a CPU is stopped with the 'stop-self' RTAS call, its state > > 'halted' is switched to 1 and, in this case, the MSR is not taken into > > account anymore in the cpu_has_work() routine. Only the pending > > hardware interrupts are checked with their LPCR:PECE* enablement bit. > > > > If the DECR timer fires after 'stop-self' is called and before the CPU > > 'stop' state is reached, the nearly-dead CPU will have some work to do > > and the guest will crash. This case happens very frequently with the > > not yet upstream P9 XIVE exploitation mode. In XICS mode, the DECR is > > occasionally fired but after 'stop' state, so no work is to be done > > and the guest survives. > > > > I suspect there is a race between the QEMU mainloop triggering the > > timers and the TCG CPU thread but I could not quite identify the root > > cause. To be safe, let's disable the decrementer interrupt in the LPCR > > when the CPU is halted and reenable it when the CPU is restarted. > > Moreover, disabling the DECR in the reset path solves the TCG multi cpu > reboot case, as reboot path does not call stop-cpu rtas call.
SHouldn't we do it in set_papr too and only turn it on for the boot CPU and in start-cpu RTAS call ? Same with the other PECEs in fact... > diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c > index 3e20b1d886..c5150ee590 100644 > --- a/hw/ppc/spapr_cpu_core.c > +++ b/hw/ppc/spapr_cpu_core.c > @@ -86,6 +86,15 @@ static void spapr_cpu_reset(void *opaque) > cs->halted = 1; > > env->spr[SPR_HIOR] = 0; > + /* Disable DECR for secondary cpus */ > + if (cs != first_cpu) { > + if (env->mmu_model == POWERPC_MMU_3_00) { > + env->spr[SPR_LPCR] &= ~LPCR_DEE; > + } else { > + /* P7 and P8 both have same bit for DECR */ > + env->spr[SPR_LPCR] &= ~LPCR_P8_PECE3; > + } > + } > } > > static void spapr_cpu_destroy(PowerPCCPU *cpu) > > > Regards > Nikunj