On Tue, 2017-03-21 at 16:34 +0900, AKASHI Takahiro wrote:
> Yes, it is intentional. I removed 'offline' code in my v14 (2016/3/4).
> As you assumed, I'd expect 'online' status of all CPUs to be kept
> unchanged in the core dump.

I wonder if it would be better to take a *copy* of it and put it back
after we're done taking the CPUs down? As things stand, we now have
*three* different methods of taking down all the CPUs... and *none* of
them allow a platform to override it with an NMI-based or STONITH-based 
method, which seems like something of an oversight.

> If you can agree, I would like to modify this disputed warning code to:
> 
> +     BUG_ON(!in_kexec_crash && (stuck_cpus || (num_online_cpus() > 1)));
> +     WARN(in_kexec_crash && (stuck_cpus || smp_crash_stop_failed()),
> +             "Some CPUs may be stale, kdump will be unreliable.\n");

That works; thanks.

FWIW I'm currently blaming my platform's firmware for my sporadic
crash-on-CPU#1 failures. If your testing includes crashes on non-boot
CPUs (perhaps using the sysrq hack I posted) and it reliably passes for
you, then let's ignore that for now.

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
kexec mailing list
[email protected]
http://lists.infradead.org/mailman/listinfo/kexec

Reply via email to