Quoting Michael Roth (2020-08-05 17:29:28) > Quoting Michael Roth (2020-08-04 23:37:32) > > Quoting Michael Ellerman (2020-08-04 22:07:08) > > > Greg Kurz <gr...@kaod.org> writes: > > > > On Tue, 04 Aug 2020 23:35:10 +1000 > > > > Michael Ellerman <m...@ellerman.id.au> wrote: > > > >> Spinning forever seems like a bad idea, but as has been demonstrated at > > > >> least twice now, continuing when we don't know the state of the other > > > >> CPU can lead to straight up crashes. > > > >> > > > >> So I think I'm persuaded that it's preferable to have the kernel stuck > > > >> spinning rather than oopsing. > > > >> > > > > > > > > +1 > > > > > > > >> I'm 50/50 on whether we should have a cond_resched() in the loop. My > > > >> first instinct is no, if we're stuck here for 20s a stack trace would > > > >> be > > > >> good. But then we will probably hit that on some big and/or heavily > > > >> loaded machine. > > > >> > > > >> So possibly we should call cond_resched() but have some custom logic in > > > >> the loop to print a warning if we are stuck for more than some > > > >> sufficiently long amount of time. > > > > > > > > How long should that be ? > > > > > > Yeah good question. > > > > > > I guess step one would be seeing how long it can take on the 384 vcpu > > > machine. And we can probably test on some other big machines. > > > > > > Hopefully Nathan can give us some idea of how long he's seen it take on > > > large systems? I know he was concerned about the 20s timeout of the > > > softlockup detector. > > > > > > Maybe a minute or two? > > > > Hmm, so I took a stab at this where I called cond_resched() after > > every 5 seconds of polling and printed a warning at the same time (FWIW > > that doesn't seem to trigger any warnings on a loaded 96-core mihawk > > system using KVM running the 384vcpu unplug loop) > > > > But it sounds like that's not quite what you had in mind. How frequently > > do you think we should call cond_resched()? Maybe after 25 iterations > > of polling smp_query_cpu_stopped() to keep original behavior somewhat > > similar? > > > > I'll let the current patch run on the mihawk system overnight in the > > meantime so we at least have that data point, but would be good to > > know what things look like a large pHyp machine. > > At one point I did manage to get the system in a state where unplug > operations were taking 1-2s, but still not enough to trigger any > 5s warning, and I wasn't able to reproduce that in subsequent runs. > > I also tried reworking the patch so that we print a warning and > cond_resched() after 200 ms to make sure that path gets executed, but > only managed to trigger the warning twice after a few hours. > > So, if we print a warning after a couple minutes, that seems pretty > conservative as far as avoiding spurious warnings. And if we > cond_resched() after 25 loops of polling (~0.1 ms in the cases
~0.1 seconds I mean > that caused the original crash), that would avoid most of the > default RCU/lockup warnings. > > But having a second timeout to trigger the cond_resched() after some > set interval like 2s seems more deterministic since we're less > susceptible to longer delays due to things like the RTAS calls > contending for QEMU's global mutex in the the KVM case. > > > > > > Thanks! > > > > > > > > >> > Fixes: eac1e731b59ee ("powerpc/xive: guest exploitation of the XIVE > > > >> > interrupt controller") > > > >> > Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1856588 > > > >> > > > >> This is not public. > > > > > > > > I'll have a look at changing that. > > > > > > Thanks. > > > > > > cheers