Please consider the attached patch. SUMMARY
The current function to manually restore breakpoints (hw_breakpoint_restore) places the contents of the thread.debugreg6 variable back into the dr6 hardware register, a piece of hardware intended to be read, not used as local storage. The problem with this is that systems that register for breakpoints, including the perf event system and kgdb/kdb, modify this variable. I have observed artificially generated stacked debug statuses being cycled through dr6 and this variable over and over again in a loop by calls to hw_breakpoint_restore() that cause strange behaviors in kdb and this layer because this function allows user defined breakpoint statuses to be stuffed back into the dr6 register. This proposed function is necessary because upon debugger entry, you must set dr7 to 0 to disable breakpoints while inside the debugger then call hw_breakpoint_restore() to reload the state and re-enable them. This function is identical to hw_breakpoint_restore() except it leaves out writing a user defined thread.debugreg6 variable back into a hardware register not intended for that purpose. This behavior of stuffing articial values into dr6 can and does cause system crashes and software misbehavior. The proposed function improves the robustness of Linux. CONDITIONS WHICH RESULT IN THIS SYSTEM FAULT OR ERRONEOUS STATUS kgdb/kdb and the perf event system both present garbage status in dr6 then subsequently write this status into the thread.debugreg6 variable, then in some cases call hw_breakpoint_restore() which writes this status back into the dr6 hardware register. arch/x86/kernel/kgdb.c static void kgdb_hw_overflow_handler(struct perf_event *event, struct perf_sample_data *data, struct pt_regs *regs) { struct task_struct *tsk = current; int i; for (i = 0; i < 4; i++) if (breakinfo[i].enabled) tsk->thread.debugreg6 |= (DR_TRAP0 << i); } I wanted to note here that this is the kgdb perf handler. What's a serious problem here (and other perf handlers do this too) is that kgdb has receive ONE breakpoint exception when this handler is called, then inside this handler he sets ALL exception bits for any user defined exceptions then sends this value to get stuffed back into the dr6 register, so for each breakpoint that happens, it will trigger a cascade of checks in the hw_breakpoint.c handler most hitting NULL bp structures when this value gets read back out of dr6. If one of them fires off and debugreg6 has gotten zapped a system crash can occur. arch/x86/kernel/kgdb.c static void kgdb_correct_hw_break(void) { ... snip ... if (!dbg_is_early) hw_breakpoint_restore(); ... snip ... } arch/x86/kernel/hw_breakpoint.c void hw_breakpoint_restore(void) { set_debugreg(__this_cpu_read(cpu_debugreg[0]), 0); set_debugreg(__this_cpu_read(cpu_debugreg[1]), 1); set_debugreg(__this_cpu_read(cpu_debugreg[2]), 2); set_debugreg(__this_cpu_read(cpu_debugreg[3]), 3); set_debugreg(current->thread.debugreg6, 6); set_debugreg(__this_cpu_read(cpu_dr7), 7); } The hardware only alters those bits that change, the rest of the altered dr6 value remains in the register. Upon the next int1 exception, dr6 presents this manufactured status to the int1 handler in hw_breakpoint.c which calls the non-existent breakpoint exceptions and any handlers which may have validly registered, creating phantom events. If other subsystems which call the perf handlers also have breakpoints registered, this manufactured status causes erroneous events to be signaled to the layers above. arch/x86/kernel/hw_breakpoint.c static int hw_breakpoint_handler(struct die_args *args) { ... snip ... /* Handle all the breakpoints that were triggered */ for (i = 0; i < HBP_NUM; ++i) { if (likely(!(dr6 & (DR_TRAP0 << i)))) continue; ... snip ... perf_bp_event(bp, args->regs); ... snip ... } After a few iterations of this cycling through the system, the thread.debugreg6 variable starts to resemble a random number generator as far as to which breakpoint just occurred. So it goes like this: INT1 exception #1 -> kgdb/perf -> set status bits for ALL exceptions I registered (1,2,3,4) -> stuff it back into dr6 -> INT1 exception #2 -> pickup bogus status in dr6 -> call handlers for (1,2,3,4) -> Oops someone changed thread.debugreg6 (kgdb,perf,kdb,gdb,os, current moved) -> CRASH TESTING AND REVIEW PERFORMED I have reviewed all the code that touches this patch and have determined it will function and support all of the software that depends on this handler properly. I have compiled and tested this patch with a test harness that tests the robustness of the linux breakpoint API and handlers in the following ways: 1. Setting multiple conditional breakpoints through arch_install_hw_breakpoint API across four processors to test the rate at which the interface can handle breakpoint exceptions 2. Setting unregistered breakpoints to test the handlers robustness in dealing with error handling conditions and errant or spurious hardware conditions and to simulate actual "lazy debug register switching" with null bp handlers to test the robustness of the handlers. 3. Clearing and setting breakpoints across multiple processors then triggering concurrent exceptions in both interrupt and process contexts. Signed-off-by: Jeff Merkey <linux....@gmail.com> --- arch/x86/kernel/hw_breakpoint.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/arch/x86/kernel/hw_breakpoint.c b/arch/x86/kernel/hw_breakpoint.c index 50a3fad..7ea0f78 100644 --- a/arch/x86/kernel/hw_breakpoint.c +++ b/arch/x86/kernel/hw_breakpoint.c @@ -425,6 +425,16 @@ void hw_breakpoint_restore(void) } EXPORT_SYMBOL_GPL(hw_breakpoint_restore); +void hw_breakpoint_enable(void) +{ + set_debugreg(__this_cpu_read(cpu_debugreg[0]), 0); + set_debugreg(__this_cpu_read(cpu_debugreg[1]), 1); + set_debugreg(__this_cpu_read(cpu_debugreg[2]), 2); + set_debugreg(__this_cpu_read(cpu_debugreg[3]), 3); + set_debugreg(__this_cpu_read(cpu_dr7), 7); +} +EXPORT_SYMBOL_GPL(hw_breakpoint_enable); + /* * Handle debug exception notifications. * -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/