On Fri, Apr 28, 2006 at 06:19:24PM -0400, Don Zickus wrote: > When kexec goes to issue an nmi it uses set_nmi_callback() to have the > other cpus execute the proper shutdown code. Unfortunately, under certain > situations set_nmi_callback will fail (ie oprofile has it reserved > already). This will cause kexec/kdump to hang and do nothing. :( >
Looking at the set_nmi_callback(), there does not seem to be anything which will make it fail. I think enabling profiling support will only disable any regular NMI generation from LAPIC for watchdog purposes because performance registers being used for NMI generation are claimed back. So even if profiling is enabled, kexec/kdump should not fail. > After talking to Andi, he mentioned that subsystems should be using the > notifier callback on the die chain instead. The included patch > incorporates that. The priority is set to 0, hopefully causing the > notifier to be the first one called. > Ok if the goal is to force the subsystems to rely on die notifier chain instead of nmi_callback and getting rid of set_nmi_callback() interfaces, then it spells some problems for kdump, as kdump is different for other subsystems. You rightly pointed out that what if chain is corrupted or if some die notifier funciton hangs. Looks like that notifiers are called in increasing priority order. Looking at the code, it looks like notifier with priority 0x7fffffff will be called first. But still there is no gurantee. People registering first with this priority will be called first. Kdump registers in then end hence will be called last, so liable to fail. > However, after talking to Vivek about this, he mentioned that he could > still invision conditions (the die_chain is corrupted) where even this > procedure might not work. > > I believe using the notifier is safer for now. Plus I am working on a > patch that removes the set_nmi_callback()/unset_nmi_callback() (hence my > push for this patch), so I would like to have this patch go in. :) > > Vivek also mentioned some other work at replacing the nmi stack > completely, which would make this patch moot. But I don't know the state > of it. Looks like changing the idt table to a crash specific function for NMI vector is the safest solution for kdump. We had discussed this in the past in the context of stack overflow safe dump. Fernando from NTT Data had posted the patches for it. May be you can fish out those patches and just extract the piece of code which replaces NMI vector handling. Thanks Vivek
_______________________________________________ fastboot mailing list [email protected] https://lists.osdl.org/mailman/listinfo/fastboot
