Hi Mike, There is a bit of history to this code, but not in a good way :)
Michael Roth <mdr...@linux.vnet.ibm.com> writes: > For a power9 KVM guest with XIVE enabled, running a test loop > where we hotplug 384 vcpus and then unplug them, the following traces > can be seen (generally within a few loops) either from the unplugged > vcpu: > > [ 1767.353447] cpu 65 (hwid 65) Ready to die... > [ 1767.952096] Querying DEAD? cpu 66 (66) shows 2 > [ 1767.952311] list_del corruption. next->prev should be c00a000002470208, > but was c00a000002470048 ... > > At that point the worker thread assumes the unplugged CPU is in some > unknown/dead state and procedes with the cleanup, causing the race with > the XIVE cleanup code executed by the unplugged CPU. > > Fix this by inserting an msleep() after each RTAS call to avoid We previously had an msleep(), but it was removed: b906cfa397fd ("powerpc/pseries: Fix cpu hotplug") > pseries_cpu_die() returning prematurely, and double the number of > attempts so we wait at least a total of 5 seconds. While this isn't an > ideal solution, it is similar to how we dealt with a similar issue for > cede_offline mode in the past (940ce422a3). Thiago tried to fix this previously but there was a bit of discussion that didn't quite resolve: https://lore.kernel.org/linuxppc-dev/20190423223914.3882-1-bauer...@linux.ibm.com/ Spinning forever seems like a bad idea, but as has been demonstrated at least twice now, continuing when we don't know the state of the other CPU can lead to straight up crashes. So I think I'm persuaded that it's preferable to have the kernel stuck spinning rather than oopsing. I'm 50/50 on whether we should have a cond_resched() in the loop. My first instinct is no, if we're stuck here for 20s a stack trace would be good. But then we will probably hit that on some big and/or heavily loaded machine. So possibly we should call cond_resched() but have some custom logic in the loop to print a warning if we are stuck for more than some sufficiently long amount of time. > Fixes: eac1e731b59ee ("powerpc/xive: guest exploitation of the XIVE interrupt > controller") > Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1856588 This is not public. I tend to trim Bugzilla links from the change log, because I'm not convinced they will last forever, but it is good to have them in the mail archive. cheers > Cc: Michael Ellerman <m...@ellerman.id.au> > Cc: Cedric Le Goater <c...@kaod.org> > Cc: Greg Kurz <gr...@kaod.org> > Cc: Nathan Lynch <nath...@linux.ibm.com> > Signed-off-by: Michael Roth <mdr...@linux.vnet.ibm.com> > --- > arch/powerpc/platforms/pseries/hotplug-cpu.c | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > > diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c > b/arch/powerpc/platforms/pseries/hotplug-cpu.c > index c6e0d8abf75e..3cb172758052 100644 > --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c > +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c > @@ -111,13 +111,12 @@ static void pseries_cpu_die(unsigned int cpu) > int cpu_status = 1; > unsigned int pcpu = get_hard_smp_processor_id(cpu); > > - for (tries = 0; tries < 25; tries++) { > + for (tries = 0; tries < 50; tries++) { > cpu_status = smp_query_cpu_stopped(pcpu); > if (cpu_status == QCSS_STOPPED || > cpu_status == QCSS_HARDWARE_ERROR) > break; > - cpu_relax(); > - > + msleep(100); > } > > if (cpu_status != 0) { > -- > 2.17.1