On Fri, Apr 28, 2006 at 06:19:24PM -0400, Don Zickus wrote:
> When kexec goes to issue an nmi it uses set_nmi_callback() to have the
> other cpus execute the proper shutdown code.  Unfortunately, under certain
> situations set_nmi_callback will fail (ie oprofile has it reserved
> already).  This will cause kexec/kdump to hang and do nothing.  :(
> 

Looking at the set_nmi_callback(), there does not seem to be anything
which will make it fail. I think enabling profiling support will only
disable any regular NMI generation from LAPIC for watchdog purposes because
performance registers being used for NMI generation are claimed back.

So even if profiling is enabled, kexec/kdump should not fail.

> After talking to Andi, he mentioned that subsystems should be using the
> notifier callback on the die chain instead.  The included patch
> incorporates that.  The priority is set to 0, hopefully causing the
> notifier to be the first one called.  
> 

Ok if the goal is to force the subsystems to rely on die notifier chain
instead of nmi_callback and getting rid of set_nmi_callback() interfaces,
then it spells some problems for kdump, as kdump is different for other
subsystems. You rightly pointed out that what if chain is corrupted
or if some die notifier funciton hangs.

Looks like that notifiers are called in increasing priority order. Looking
at the code, it looks like notifier with priority 0x7fffffff will be called
first. But still there is no gurantee. People registering first with
this priority will be called first. Kdump registers in then end hence
will be called last, so liable to fail. 


> However, after talking to Vivek about this, he mentioned that he could
> still invision conditions (the die_chain is corrupted) where even this
> procedure might not work.  
> 
> I believe using the notifier is safer for now.  Plus I am working on a
> patch that removes the set_nmi_callback()/unset_nmi_callback() (hence my
> push for this patch), so I would like to have this patch go in.  :)
> 
> Vivek also mentioned some other work at replacing the nmi stack
> completely, which would make this patch moot.  But I don't know the state
> of it.  

Looks like changing the idt table to a crash specific function for NMI
vector is the safest solution for kdump. We had discussed this in the past
in the context of stack overflow safe dump. Fernando from NTT Data had posted
the patches for it. May be you can fish out those patches and just extract
the piece of code which replaces NMI vector handling.

Thanks
Vivek
_______________________________________________
fastboot mailing list
[email protected]
https://lists.osdl.org/mailman/listinfo/fastboot

Reply via email to