On Tue, 2006-05-02 at 12:19 +0200, Andi Kleen wrote:
> On Tuesday 02 May 2006 12:11, Fernando Luis Vazquez Cao wrote:
> > > All NMI handlers think they are different and more special than
> everybody
> > > else. Otherwise they wouldn't be NMI. kdump is really in no way
> special.
> > If what we want is a reliable crash dumping solution kdump should be
> > treated as a special case (see discussion below).
> 
> It's special enough to just set a high priority. More speciality is
> really
> not needed.
True if the NMI callback is granted the highest priority and die_chain
is not corrupted.

Besides, if we use the notify_die approach and there is a nmi_callback
registered at the time of the crash, we have to either unset the
nmi_callback or depend on it being robust against crashes (particularly
stack overflow safe) and returning 0 (see current nmi handler
implementation below). In the former case the current call to

set_nmi_callback(crash_nmi_callback);

would become

unset_nmi_callback(void);
register_die_notifier(&crash_nmi_exceptions_nb);

--
asmlinkage __kprobes void do_nmi(struct pt_regs * regs, long error_code)
{
        int cpu = safe_smp_processor_id();

        nmi_enter();
        add_pda(__nmi_count,1);
        if (!rcu_dereference(nmi_callback)(regs, cpu))
                default_do_nmi(regs);
        nmi_exit();
}

asmlinkage __kprobes void default_do_nmi(struct pt_regs *regs)
{
        unsigned char reason = 0;
        int cpu;

        cpu = smp_processor_id();

        /* Only the BSP gets external NMIs from the system.  */
        if (!cpu)
                reason = get_nmi_reason();

        if (!(reason & 0xc0)) {
                if (notify_die(DIE_NMI_IPI, "nmi_ipi", regs, reason, 2,
SIGINT)
                                                                ==
NOTIFY_STOP)
                        return;
#ifdef CONFIG_X86_LOCAL_APIC
.....
}

> > Besides, the default NMI handler and the notify_die function itself
> use
> > the stack profusely without checking the validity of the stack
> pointer
> > or the state of the stacks (of course this applies to the current
> > implementation too). 
> 
> It runs on a special reserved NMI stack. And if that doesn't work
> anymore then you'll never execute any NMI code because the CPU 
> won't be able to write the initial stack frame.
Now that I know that you were referring to x86_64 let me be more
specific.

When I said that we do not check the validity of the stack pointer I was
referring to the fact that we do not check how much memory is free in
the stack. We may be able to write the initial stack frame and have
enough space for do_nmi's local variables, but the subsequent calls to
default_do_nmi and notify_die may still bloat the stack. x86_64 uses a
reserved NMI stack and this certainly reduces the probability of such an
event, but we cannot predict what will happen when the system goes nuts.
For this reason it is safer to switch to a crash-time-specific stack in
the event of a crash, even if we have private NMI stacks. Unfortunately
i386 does not have either (please correct me if I am wrong).

Regards,

Fernando

_______________________________________________
fastboot mailing list
[email protected]
https://lists.osdl.org/mailman/listinfo/fastboot

Reply via email to