All the kernel debug style tools (kdb, kgdb, nlkd, netdump, lkcd,
crash, kdump etc.) have a common requirement, they need to do a crash
stop of the systems.  This means stopping all the cpus, even if some of
the cpus are spinning disabled.  In addition, each cpu has to save
enough state to start the diagnosis of the problem.

* Each debug style tool has written its own code for interrupting the
  other cpus and for saving cpu state.

* Some tools try a normal IPI first then send a non-maskable interrupt
  after a delay.

* Some tools always send a NMI first, which can result in incomplete or
  wrong machine state if NMI arrives at the wrong time.

* Most of the tools do not know how to cope with the IA64 architecture
  defined rendezvous algorithm, which interferes with an OS driven
  rendezvous.

* Needless to say, every single patch set conflicts with all the
  others, which makes it very difficult to install more than one of the
  tools at a time.

The solution is to define a common crash_stop API that can be used by
_all_ of the debug style tools, without reinventing the wheel each
time.  The following crash_stop patches implement this API for i386,
x86_64 and ia64.  It correctly handles the complicated ia64 algorithm
for MCA and INIT, unlike almost every current debug style tool.

Adding other architectures is a fairly simple matter, define
the IPI and NMI routines (the crash_stop_$(ARCH)_handlers patch),
intercept the events that indicate that the system is dying (the
crash_stop_$(ARCH) patch), update the Kconfig entry for CRASH_STOP to
add the new $(ARCH).

Most of the design documentation is in the crash_stop_common patch.
Please read that before replying.

crash_stop_header               The architecture independent header.

crash_stop_common               The architecture independent code.

crash_stop_i386_handlers        i386 specific code to send and respond
                                to the crash_stop IPI and NMI.

crash_stop_i386                 i386 specific code to intercept events
                                that indicate that the system is dying.

crash_stop_x86_64_nmiwatchdog   i386 creates an event for NMI watchdog,
                                it is missing from x86_64.  Add
                                DIE_NMIWATCHDOG to x86_64.

crash_stop_x86_64_handlers      x86_64 specific code to send and
                                respond to the crash_stop IPI and NMI.

crash_stop_x86_64               x86_64 specific code to intercept
                                events that indicate that the system is
                                dying.

crash_stop_ia64_handlers        ia64 specific code to send and respond
                                to the crash_stop IPI and NMI.

crash_stop_ia64                 ia64 specific code to intercept events
                                that indicate that the system is dying.

crash_stop_common_Kconfig       Add crash_stop to the config system.
                                Only for i386, x86_64 and ia64 at the
                                moment, extend as new architectures are
                                added.

crash_stop_demo                 A demonstration of using crash_stop in
                                a debug style tool.  Not for inclusion
                                in the kernel.

crash_stop_test                 Test the crash_stop code.  Not for
                                inclusion in the kernel.
-
To unsubscribe from this list: send the line "unsubscribe linux-arch" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to