http://lxr.linux.no/linux/Documentation/nmi_watchdog.txt1 2[NMI watchdog is available for x86 and x86-64 architectures] 3 4Is your system locking up unpredictably? No keyboard activity, just 5a frustrating complete hard lockup? Do you want to help us debugging 6such lockups? If all yes then this document is definitely for you. 7 8On many x86/x86-64 type hardware there is a feature that enables 9us to generate 'watchdog NMI interrupts'. (NMI: Non Maskable Interrupt 10which get executed even if the system is otherwise locked up hard). 11This can be used to debug hard kernel lockups. By executing periodic 12NMI interrupts, the kernel can monitor whether any CPU has locked up, 13and print out debugging messages if so. 14 15In order to use the NMI watchdog, you need to have APIC support in your 16kernel. For SMP kernels, APIC support gets compiled in automatically. For 17UP, enable either CONFIG_X86_UP_APIC (Processor type and features -> Local 18APIC support on uniprocessors) or CONFIG_X86_UP_IOAPIC (Processor type and 19features -> IO-APIC support on uniprocessors) in your kernel config. 20CONFIG_X86_UP_APIC is for uniprocessor machines without an IO-APIC. 21CONFIG_X86_UP_IOAPIC is for uniprocessor with an IO-APIC. [Note: certain 22kernel debugging options, such as Kernel Stack Meter or Kernel Tracer, 23may implicitly disable the NMI watchdog.] 24 25For x86-64, the needed APIC is always compiled in, and the NMI watchdog is 26always enabled with I/O-APIC mode (nmi_watchdog=1). 27 28Using local APIC (nmi_watchdog=2) needs the first performance register, so 29you can't use it for other purposes (such as high precision performance 30profiling.) However, at least oprofile and the perfctr driver disable the 31local APIC NMI watchdog automatically. 32 33To actually enable the NMI watchdog, use the 'nmi_watchdog=N' boot 34parameter. Eg. the relevant lilo.conf entry: 35 36 append="nmi_watchdog=1" 37 38For SMP machines and UP machines with an IO-APIC use nmi_watchdog=1. 39For UP machines without an IO-APIC use nmi_watchdog=2, this only works 40for some processor types. If in doubt, boot with nmi_watchdog=1 and 41check the NMI count in /proc/interrupts; if the count is zero then 42reboot with nmi_watchdog=2 and check the NMI count. If it is still 43zero then log a problem, you probably have a processor that needs to be 44added to the nmi code. 45 46A 'lockup' is the following scenario: if any CPU in the system does not 47execute the period local timer interrupt for more than 5 seconds, then 48the NMI handler generates an oops and kills the process. This 49'controlled crash' (and the resulting kernel messages) can be used to 50debug the lockup. Thus whenever the lockup happens, wait 5 seconds and 51the oops will show up automatically. If the kernel produces no messages 52then the system has crashed so hard (eg. hardware-wise) that either it 53cannot even accept NMI interrupts, or the crash has made the kernel 54unable to print messages. 55 56Be aware that when using local APIC, the frequency of NMI interrupts 57it generates, depends on the system load. The local APIC NMI watchdog, 58lacking a better source, uses the "cycles unhalted" event. As you may 59guess it doesn't tick when the CPU is in the halted state (which happens 60when the system is idle), but if your system locks up on anything but the 61"hlt" processor instruction, the watchdog will trigger very soon as the 62"cycles unhalted" event will happen every clock tick. If it locks up on 63"hlt", then you are out of luck -- the event will not happen at all and the 64watchdog won't trigger. This is a shortcoming of the local APIC watchdog 65-- unfortunately there is no "clock ticks" event that would work all the 66time. The I/O APIC watchdog is driven externally and has no such shortcoming. 67But its NMI frequency is much higher, resulting in a more significant hit 68to the overall system performance. 69 70NOTE: starting with 2.4.2-ac18 the NMI-oopser is disabled by default, 71you have to enable it with a boot time parameter. Prior to 2.4.2-ac18 72the NMI-oopser is enabled unconditionally on x86 SMP boxes. 73 74On x86-64 the NMI oopser is on by default. On 64bit Intel CPUs 75it uses IO-APIC by default and on AMD it uses local APIC. 76 77[ feel free to send bug reports, suggestions and patches to 78 Ingo Molnar <[EMAIL PROTECTED]> or the Linux SMP mailing 79 list at <[EMAIL PROTECTED]> ] 80 |
