On Wed Sep 14, 2022 at 3:39 AM AEST, Leonardo Brás wrote: > On Mon, 2022-09-12 at 14:58 -0500, Nathan Lynch wrote: > > Leonardo Brás <leobra...@gmail.com> writes: > > > On Fri, 2022-09-09 at 09:04 -0500, Nathan Lynch wrote: > > > > Leonardo Brás <leobra...@gmail.com> writes: > > > > > On Wed, 2022-09-07 at 17:01 -0500, Nathan Lynch wrote: > > > > > > At the time this was submitted by Leonardo, I confirmed -- or > > > > > > thought > > > > > > I had confirmed -- with PowerVM partition firmware development that > > > > > > the following RTAS functions: > > > > > > > > > > > > - ibm,get-xive > > > > > > - ibm,int-off > > > > > > - ibm,int-on > > > > > > - ibm,set-xive > > > > > > > > > > > > were safe to call on multiple CPUs simultaneously, not only with > > > > > > respect to themselves as indicated by PAPR, but with arbitrary other > > > > > > RTAS calls: > > > > > > > > > > > > https://lore.kernel.org/linuxppc-dev/875zcy2v8o....@linux.ibm.com/ > > > > > > > > > > > > Recent discussion with firmware development makes it clear that this > > > > > > is not true, and that the code in commit b664db8e3f97 > > > > > > ("powerpc/rtas: > > > > > > Implement reentrant rtas call") is unsafe, likely explaining several > > > > > > strange bugs we've seen in internal testing involving DLPAR and > > > > > > LPM. These scenarios use ibm,configure-connector, whose internal > > > > > > state > > > > > > can be corrupted by the concurrent use of the "reentrant" functions, > > > > > > leading to symptoms like endless busy statuses from RTAS. > > > > > > > > > > Oh, does not it means PowerVM is not compliant to the PAPR specs? > > > > > > > > No, it means the premise of commit b664db8e3f97 ("powerpc/rtas: > > > > Implement reentrant rtas call") change is incorrect. The "reentrant" > > > > property described in the spec applies only to the individual RTAS > > > > functions. The OS can invoke (for example) ibm,set-xive on multiple CPUs > > > > simultaneously, but it must adhere to the more general requirement to > > > > serialize with other RTAS functions. > > > > > > > > > > I see. Thanks for explaining that part! > > > I agree: reentrant calls that way don't look as useful on Linux than I > > > previously thought. > > > > > > OTOH, I think that instead of reverting the change, we could make use of > > > the > > > correct information and fix the current implementation. (This could help > > > when we > > > do the same rtas call in multiple cpus) > > > > Hmm I'm happy to be mistaken here, but I doubt we ever really need to do > > that. I'm not seeing the need. > > > > > I have an idea of a patch to fix this. > > > Do you think it would be ok if I sent that, to prospect being an > > > alternative to > > > this reversion? > > > > It is my preference, and I believe it is more common, to revert to the > > well-understood prior state, imperfect as it may be. The revert can be > > backported to -stable and distros while development and review of > > another approach proceeds. > > Ok then, as long as you are aware of the kdump bug, I'm good. > > FWIW: > Reviewed-by: Leonardo Bras <leobra...@gmail.com>
A shame. I guess a reader/writer lock would not be much help because the crash is probably more likely to hit longer running rtas calls? Alternative is just cheat and do this...? Thanks, Nick diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index 693133972294..89728714a06e 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -26,6 +26,7 @@ #include <linux/syscalls.h> #include <linux/of.h> #include <linux/of_fdt.h> +#include <linux/panic.h> #include <asm/interrupt.h> #include <asm/rtas.h> @@ -97,6 +98,19 @@ static unsigned long lock_rtas(void) { unsigned long flags; + if (atomic_read(&panic_cpu) == raw_smp_processor_id()) { + /* + * Crash in progress on this CPU. Other CPUs should be + * stopped by now, so skip the lock in case it was being + * held, and is now needed for crashing e.g., kexec + * (machine_kexec_mask_interrupts) requires rtas calls. + * + * It's possible this could have caused rtas state breakage + * but the alternative is deadlock. + */ + return 0; + } + local_irq_save(flags); preempt_disable(); arch_spin_lock(&rtas.lock); @@ -105,6 +119,9 @@ static unsigned long lock_rtas(void) static void unlock_rtas(unsigned long flags) { + if (atomic_read(&panic_cpu) == raw_smp_processor_id()) + return; + arch_spin_unlock(&rtas.lock); local_irq_restore(flags); preempt_enable();