On Fri, 2018-07-20 at 16:29 +1000, Michael Ellerman wrote: > Michael Neuling <mi...@neuling.org> writes: > > On Fri, 2018-07-20 at 12:32 +1000, Michael Ellerman wrote: > > > Michael Neuling <mi...@neuling.org> writes: > > > > On Wed, 2018-07-18 at 13:42 +0530, Gautham R Shenoy wrote: > > > > > On Wed, Jul 18, 2018 at 09:24:19AM +1000, Michael Neuling wrote: > > > > > > > > > > > > > DEFINE(PPC_DBELL_SERVER, PPC_DBELL_SERVER); > > > > > > > diff --git a/arch/powerpc/kernel/idle_book3s.S > > > > > > > b/arch/powerpc/kernel/idle_book3s.S > > > > > > > index d85d551..5069d42 100644 > > > > > > > --- a/arch/powerpc/kernel/idle_book3s.S > > > > > > > +++ b/arch/powerpc/kernel/idle_book3s.S > > > > > > > @@ -120,6 +120,9 @@ power9_save_additional_sprs: > > > > > > > mfspr r4, SPRN_MMCR2 > > > > > > > std r3, STOP_MMCR1(r13) > > > > > > > std r4, STOP_MMCR2(r13) > > > > > > > + > > > > > > > + mfspr r3, SPRN_SPRG3 > > > > > > > + std r3, STOP_SPRG3(r13) > > > > > > > > > > > > We don't need to save it. Just restore it from paca->sprg_vdso > > > > > > which > > > > > > should > > > > > > never change. > > > > > > > > > > Ok. I will respin a patch to restore SPRG3 from paca->sprg_vdso. > > > > > > > > > > > > > > > > > How can we do better at catching these missing SPRGs? > > > > > > > > > > We can go through the list of SPRs from the POWER9 User Manual and > > > > > document explicitly why we don't have to save/restore certain SPRs > > > > > during the execution of the stop instruction. Does this sound ok ? > > > > > > > > > > (Ref: Table 4-8, Section 4.7.3.4 from the POWER9 User Manual > > > > > accessible from > > > > > https://openpowerfoundation.org/?resource_lib=power9-processor-users-m > > > > > anua > > > > > l) > > > > > > > > I was thinking of a boot time test case built into linux. linux has some > > > > boot > > > > time test cases which you can enable via CONFIG options. > > > > > > > > Firstly you could see if an SPR exists using the same trick xmon does in > > > > dump_one_spr(). Then once you have a list of usable SPRs, you could > > > > write > > > > all > > > > the known ones (I assume you'd have to leave out some, like the PSSCR), > > > > then > > > > set > > > > > > Write what value? > > > > > > Ideally you want to write a random bit pattern to reduce the chance > > > that only some bits are being restored. > > > > The xmon dump_one_spr() trick tries to work around that by writing one > > random > > value and then a different one to see if it really is a nop. > > > > > But you can't do that because writing a value to an SPRs has an effect. > > > > Sure that's a concern but xmon seems to get away with it. > > I don't think it writes, but maybe I'm reading the code wrong.
You're right, sorry. It's the write the GPR that becomes a NOP when the SPR is not there. I misremembered how it worked. Maybe that won't work stop since we'd need to be able change the SPR value to ensure we don't hit the reset value after a stop state. We'd be able to detect SPRs that that change from it's reset value but not those that are already at their reset value. > Writing a random value to the MSR could be fun :) Fortunately the MSR is not an SPR :-P > > > > Yeah, I'm not convinced it'll work either but it would be a nice piece of > > test > > infrastructure to have if it does work. > > Yeah I guess I'd rather we worked on 1) and 2) below first :) ok > > We'd still need to marry up the SPR numbers we get from the test to what's > > actually being restored in Linux. > > > > > But there's a much simpler solution, we should 1) have a selftest for > > > getcpu() and 2) we should be running the glibc (I think?) test suite > > > that found this in the first place. It's frankly embarrassing that we > > > didn't find this. > > > > Yeah, we should do that also, but how do we catch the next SPR we are > > missing. > > I'd like some systematic way of doing that rather than wack-a-mole. > > Whack-a-mole 😂😂😂😂 I preferred waking them :-) > We could also improve things by documenting how each SPR is handled, eg. > is it saved/restored across idle, syscall, KVM etc. And possibly that > could even become code that defines how SPRs are handled, rather than it > all being done ad-hoc. Yeah. It's complicated by linux calling opal_slw_set_reg() to change what's saved. This was part of the reason I'd hoped doing a linux test case would help as we could do it after those calls. Mikey