On Monday 01 May 2006 21:56, Vivek Goyal wrote:
> On Fri, Apr 28, 2006 at 06:19:24PM -0400, Don Zickus wrote:
> > When kexec goes to issue an nmi it uses set_nmi_callback() to have the
> > other cpus execute the proper shutdown code.  Unfortunately, under certain
> > situations set_nmi_callback will fail (ie oprofile has it reserved
> > already).  This will cause kexec/kdump to hang and do nothing.  :(
> > 
> 
> Looking at the set_nmi_callback(), there does not seem to be anything
> which will make it fail. I think enabling profiling support will only
> disable any regular NMI generation from LAPIC for watchdog purposes because
> performance registers being used for NMI generation are claimed back.
> 
> So even if profiling is enabled, kexec/kdump should not fail.

profiling just registers a lower priority callback. Also with Don's 
changes profiling will only trigger when there are profile events
anyways - so all the interactions will be much cleaner.

> 
> > After talking to Andi, he mentioned that subsystems should be using the
> > notifier callback on the die chain instead.  The included patch
> > incorporates that.  The priority is set to 0, hopefully causing the
> > notifier to be the first one called.  
> > 
> 
> Ok if the goal is to force the subsystems to rely on die notifier chain
> instead of nmi_callback and getting rid of set_nmi_callback() interfaces,
> then it spells some problems for kdump, as kdump is different for other
> subsystems. You rightly pointed out that what if chain is corrupted
> or if some die notifier funciton hangs.

All NMI handlers think they are different and more special than everybody
else. Otherwise they wouldn't be NMI. kdump is really in no way special.

> 
> Looks like that notifiers are called in increasing priority order. Looking
> at the code, it looks like notifier with priority 0x7fffffff will be called
> first. But still there is no gurantee. People registering first with
> this priority will be called first. Kdump registers in then end hence
> will be called last, so liable to fail. 

Sorry, but that's just a dumb argument. All kernel code needs to cooperate
with others - if there is a problem it's just fixed. But having multiple
callbacks just because you don't trust someone else doesn't make sense.

-Andi
_______________________________________________
fastboot mailing list
[email protected]
https://lists.osdl.org/mailman/listinfo/fastboot

Reply via email to