On Sat, Jul 19, 2014 at 10:58:18AM -0700, Marcel Moolenaar wrote: > > On Jul 18, 2014, at 9:07 AM, Konstantin Belousov <[email protected]> wrote: > > > It was mentioned somewhere recently, that typical BIOS today configures > > NMI delivery on the hardware events as broadcast. When I developerd > > the dmar(4) busdma backend, I indeed met the problem, and wrote a > > prototype which avoided startup of ddb on all cores. Instead, the patch > > implements custom spinlock, which allows only one core to win, other > > cores ignore the NMI, by spinning on lock. > > > > The issue which I see on at least two different machines with different > > Intel chipsets, is that NMI is somehow sticky, i.e. it is re-delivered > > after the handler executes iret. I am not sure what the problem is, > > whether it is due to hardware needing some ACK, or a bug in code. > > > > Anyway, even on two-cores machine, having both cores simultaneously > > enter NMI makes the use of ddb impossible, so I believe the patch is > > improvement. I make measures to ensure that reboot from ddb prompt > > works. > > > > Thought ? > > One may call kdb_enter on different CPUs at the same time and it's > also possible to call panic on multiple CPUs at the same time (but > we serialize panic() right now). What if we let kdb_enter at al deal > with concurrency, instead of doing it specifically for NMIs? Then, on 80-threads machine I get the 80 ddb sessions on NMI broadcast, like now. With your proposal, it will be somewhat better, since sessions are serialized, so I can do the reboot from the first one.
Still, I hope to understand what I am missing to stop NMI from delivering in loop. Then, having only one ddb entry would mean that I should return only once. > > Also: we may want to do something else than going to the debugger > when we see an NMI. More complexity in the NMI handler and specific > to entering the debugger seems to move us away from doing other > things more easily. I agree there. > > Aside: I've always wanted to have the ability to have the kernel > debugger switch to a different CPU so that you can create DDB > commands that dump hardware resources like TLBs, etc. To support > this, you want the KDB layer to have good CPU handling, which > possibly makes it also a good place to handle concurrent entry > into the debugger from different CPUs. Me too. I have another half-finished patch which does this, it allows to migrate the ddb from one cpu to another. It worked by signalling a destination cpu that it should activate, while source cpu starts spinning. I do not remember exact problems which were unresolved. I needed this because some state is CPU-local, cannot be accessed from other cores, and is not saved in pcb. I definitely looked at EFER and MISC_FEATURES MSRs, and local apic state.
pgpeUfSaFhEIn.pgp
Description: PGP signature
