Re: [Kgdb-bugreport] [PATCH] attempt fix up breakpoint on reenter to KGDB
Sergei Shtylyov wrote: If there are no objections, I will commit the following patch. Sorry for the belated objections. :- Signed-off-by: Jason Wessel [EMAIL PROTECTED] Index: linux-2.6.21.1/kernel/kgdb.c === --- linux-2.6.21.1.orig/kernel/kgdb.c +++ linux-2.6.21.1/kernel/kgdb.c [...] @@ -848,12 +856,36 @@ int kgdb_handle_exception(int ex_vector, long kgdb_usethreadid = 0; int error = 0, all_cpus_synced = 0; struct pt_regs *shadowregs; -int processor = smp_processor_id(); +int processor = raw_smp_processor_id(); void *local_debuggerinfo; /* Panic on recursive debugger calls. */ -if (atomic_read(debugger_active) == smp_processor_id() + 1) +if (atomic_read(debugger_active) == raw_smp_processor_id() + 1) { +exception_level++; +addr = kgdb_arch_pc(ex_vector, linux_regs); +kgdb_deactivate_sw_breakpoints(); +if (kgdb_remove_sw_break(addr) == 0) { +/* If the break point removed ok at the place exception + * occurred, try to recover and print a warning to the end + * user because the user planted a breakpoint in a place + * that KGDB needs in order to function. + */ +exception_level = 0; +kgdb_skipexception(ex_vector, linux_regs); +kgdb_activate_sw_breakpoints(); +printk(KERN_CRIT KGDB: re-enter exception: breakpoint removed\n); +WARN_ON(1); And won't WARN_ON(1) in turn cause and invalid operation exception? +return 0; +} WBR, Sergei - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ Kgdb-bugreport mailing list Kgdb-bugreport@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport
Re: [Kgdb-bugreport] [PATCH] attempt fix up breakpoint on reenter to KGDB
Hello, I wrote: If there are no objections, I will commit the following patch. -- This patch fixes some corner cases where KGDB will silently hang or kill the system, if a user accidentally tries to source step into a spin_unlock() call or source step in on a macro containing smp_processor_id(). The use of raw_smp_processor_id is desired in kernel/kgdb.c to fix this particular problem. To fix issues with accidental source step in on spin_unlock(), the idea is to check for the existence of a break point on the second entry to kgdb and try to remove it. Next an attempt will be made to continue normal operations. A third entry will generate a panic(), so as to stop infinite loops. Testing has shown that kgdb is much more robust with these changes and random accidental run control. It seems that something has been broken WRT backtracing... Here's the more relevant trace -- got it at the initial KGDB breakpoint: breakpoint () at kernel/kgdb.c:1773 1773} (gdb) bt #0 breakpoint () at kernel/kgdb.c:1773 #1 0x808c8811 in kgdb_late_entry () at kernel/kgdb.c:1752 #2 0x808b2a29 in init (unused=0x0) at init/main.c:692 #3 0x8020ae38 in child_rip () #4 0x in ?? () Ignoring packet error, continuing... Reply contains invalid hex digit 116 And here's the start of endless trace on target -- it suggests that KGDb has caught GPF: KGDB: re-enter exception: ALL breakpoints removed Kernel panic - not syncing: Recursive entry to debugger Call Trace: #DB [80228ccb] panic+0xe9/0x1a9 [80258f3a] kgdb_handle_exception+0x11c/0xf92 [80585cad] kfree_skbmem+0x84/0x8b [802586f3] kgdb_mem2hex+0x311/0x35d [8021d433] kgdb_notify+0xfb/0x105 [80236b36] notifier_call_chain+0x2c/0x43 [80236d61] raw_notifier_call_chain+0x9/0xb [8020c3ef] do_general_protection+0xf7/0x121 [8062f46d] error_exit+0x0/0x84 [80258408] kgdb_mem2hex+0x26/0x35d [802586f3] kgdb_mem2hex+0x311/0x35d [80258408] kgdb_mem2hex+0x26/0x35d [80258910] kgdb_hex2long+0x29/0x5f [80259534] kgdb_handle_exception+0x716/0xf92 [8021d433] kgdb_notify+0xfb/0x105 [80236b36] notifier_call_chain+0x2c/0x43 [80236d61] raw_notifier_call_chain+0x9/0xb [8020c621] do_int3+0x44/0xa7 [8062f7b3] int3+0x93/0xb0 [80259eb6] breakpoint+0xf/0x1b EOE [808c8811] kgdb_late_entry+0xba/0xc3 [808b2a29] init+0x1a2/0x37c [8020ae38] child_rip+0xa/0x12 [803f8cb4] acpi_ds_init_one_object+0x0/0x88 [808b2887] init+0x0/0x37c [8020ae2e] child_rip+0x0/0x12 Kernel panic - not syncing: Recursive entry to debugger Call Trace: #DB[1] [80228ccb] panic+0xe9/0x1a9 [8021d47a] kgdb_skipexception+0x22/0x3e [80258f3a] kgdb_handle_exception+0x11c/0xf92 [80259eb5] breakpoint+0xe/0x1b [8021d433] kgdb_notify+0xfb/0x105 [80236b36] notifier_call_chain+0x2c/0x43 [80236d61] raw_notifier_call_chain+0x9/0xb [8020c621] do_int3+0x44/0xa7 [8062f7b3] int3+0x93/0xb0 [80259eb6] breakpoint+0xf/0x1b EOE #DB [80259ecb] kgdb_panic_notify+0x9/0xd [80236b36] notifier_call_chain+0x2c/0x43 [80236c0e] atomic_notifier_call_chain+0x34/0x4e [80228cee] panic+0x10c/0x1a9 [80258f3a] kgdb_handle_exception+0x11c/0xf92 [80585cad] kfree_skbmem+0x84/0x8b [802586f3] kgdb_mem2hex+0x311/0x35d [8021d433] kgdb_notify+0xfb/0x105 [80236b36] notifier_call_chain+0x2c/0x43 [80236d61] raw_notifier_call_chain+0x9/0xb [8020c3ef] do_general_protection+0xf7/0x121 [8062f46d] error_exit+0x0/0x84 [80258408] kgdb_mem2hex+0x26/0x35d [802586f3] kgdb_mem2hex+0x311/0x35d [80258408] kgdb_mem2hex+0x26/0x35d [80258910] kgdb_hex2long+0x29/0x5f [80259534] kgdb_handle_exception+0x716/0xf92 [8021d433] kgdb_notify+0xfb/0x105 [80236b36] notifier_call_chain+0x2c/0x43 [80236d61] raw_notifier_call_chain+0x9/0xb [8020c621] do_int3+0x44/0xa7 [8062f7b3] int3+0x93/0xb0 [80259eb6] breakpoint+0xf/0x1b EOE [808c8811] kgdb_late_entry+0xba/0xc3 [808b2a29] init+0x1a2/0x37c [8020ae38] child_rip+0xa/0x12 [803f8cb4] acpi_ds_init_one_object+0x0/0x88 [808b2887] init+0x0/0x37c [8020ae2e] child_rip+0x0/0x12 Kernel panic - not syncing: Recursive entry to debugger Call Trace: [a lot of taceless panics skipped] Call Trace: #MC [80228ccb] panic+0xe9/0x1a9 [8021d47a] kgdb_skipexception+0x22/0x3e [80258f3a] kgdb_handle_exception+0x11c/0xf92 [80259eb5] breakpoint+0xe/0x1b [8021d433] kgdb_notify+0xfb/0x105 [80236b36]
Re: [Kgdb-bugreport] [PATCH] attempt fix up breakpoint on reenter to KGDB
Hello. Jason Wessel wrote: If not can you provide me with a test case to see the problem? Well, for example, with 2.6.18-rt7 kernel, stepping into smp_processor_id() was blowing away U-Boot (!) on my PPC board (even in PREEMPT_DESKTOP mode)... On x86, stepping past preempt_disable() caused reboots. Sergei, Now that sounds more interesting. Perhaps you can elaborate on the configuration further, or provide the results of a test. I wonder if I It was happening in PREEMPT_DESKTOP mode when stepping into slab_irq_save() -- which contains cpu_processor_id() in this mode. have the same board I can try that test on? In theory it should already be fixed, after merging the code. On ppc boards I have seen this happen with evlog before where it can scramble the boot loader. Looks like it's been fixed indeed. But still the session stalls in RT mode -- here the macro looks different... For the x86 problem, which part of the kernel should I step through to try out the preempt_disable()? Are you using a low level single step, a source step in, step over, or finish frame? Certainly you cannot step through all parts of the kernel, but it would be good to understand where you cannot and why it would work by turning off DEBUG_PREEMPT. This also seems to work now. Thanks for unravelling this mystery (which we were too lazy to do :-). Jason. WBR, Sergei - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ Kgdb-bugreport mailing list Kgdb-bugreport@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport
Re: [Kgdb-bugreport] [PATCH] attempt fix up breakpoint on reenter to KGDB
Sergei Shtylyov wrote: [Sat May 19 2007, 02:54:00PM EDT] Jason Wessel wrote: This patch is committed in the linux2_6_21_uprev branch across: core-lite.patch core.patch i386-lite.patch x86_64-lite.patch BTW, what's the reason we *still* have both {core|i386|ia64|powerpc|x86_64}-lite.patch and core{core|i386|ia64|powerpc|x86_64}.patch? Why not just merge them? Jason. WBR, Sergei Well I thought that core related patches weren't destined for mainline. At least I think that was the objective in 2006. For ia64, a coworker attempted to use Jason's git repository. The details aren't known to me, but he wasn't successful. The ia64.patch in core would cause significant resistance when attempting to go up stream. So please withdraw it from inclusion in the git tree. The patch primarily supports enabling breakpoints shortly after entering setup_arch. It has doubtful usefulness to others. thanks, bob - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ Kgdb-bugreport mailing list Kgdb-bugreport@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport
Re: [Kgdb-bugreport] [PATCH] attempt fix up breakpoint on reenter to KGDB
Jason Wessel wrote: This patch is committed in the linux2_6_21_uprev branch across: core-lite.patch core.patch i386-lite.patch x86_64-lite.patch BTW, what's the reason we *still* have both {core|i386|ia64|powerpc|x86_64}-lite.patch and core{core|i386|ia64|powerpc|x86_64}.patch? Why not just merge them? Jason. WBR, Sergei - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ Kgdb-bugreport mailing list Kgdb-bugreport@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport
Re: [Kgdb-bugreport] [PATCH] attempt fix up breakpoint on reenter to KGDB
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Jason Wessel wrote: If there are no objections, I will commit the following patch. Sounds great to me; avoiding spin locks() is a hassle. Ever noticed a problem with kgdb surviving a weekend of non-use? I hit a breakpoint on Saturday and hoped to continue looking at it today but the kgdb-stub, as usual, got out of phase and I had to re-do the test that caused the breakpoint; I've seen it happen many times. - -piet -- This patch fixes some corner cases where KGDB will silently hang or kill the system, if a user accidentally tries to source step into a spin_unlock() call or source step in on a macro containing smp_processor_id(). The use of raw_smp_processor_id is desired in kernel/kgdb.c to fix this particular problem. To fix issues with accidental source step in on spin_unlock(), the idea is to check for the existence of a break point on the second entry to kgdb and try to remove it. Next an attempt will be made to continue normal operations. A third entry will generate a panic(), so as to stop infinite loops. Testing has shown that kgdb is much more robust with these changes and random accidental run control. Signed-off-by: Jason Wessel [EMAIL PROTECTED] --- arch/i386/kernel/kgdb.c |8 kernel/kgdb.c | 47 --- 2 files changed, 48 insertions(+), 7 deletions(-) Index: linux-2.6.21.1/kernel/kgdb.c === --- linux-2.6.21.1.orig/kernel/kgdb.c +++ linux-2.6.21.1/kernel/kgdb.c @@ -70,6 +70,8 @@ int kgdb_connected; int kgdb_may_fault; /* All the KGDB handlers are installed */ int kgdb_from_module_registered = 0; +/* Guard for recursive entry */ +static int exception_level = 0; /* We provide a kgdb_io_ops structure that may be overriden. */ struct kgdb_io __attribute__ ((weak)) kgdb_io_ops; @@ -166,6 +168,12 @@ int __attribute__ ((weak)) return 0; } +unsigned long __attribute__ ((weak)) +kgdb_arch_pc(int exception, struct pt_regs *regs) +{ +return instruction_pointer(regs); +} + static int hex(char ch) { if ((ch = 'a') (ch = 'f')) @@ -580,11 +588,11 @@ static void kgdb_wait(struct pt_regs *re int processor; local_irq_save(flags); -processor = smp_processor_id(); +processor = raw_smp_processor_id(); kgdb_info[processor].debuggerinfo = regs; kgdb_info[processor].task = current; atomic_set(procindebug[processor], 1); -atomic_set(kgdb_sync_softlockup[smp_processor_id()], 1); +atomic_set(kgdb_sync_softlockup[raw_smp_processor_id()], 1); /* Wait till master processor goes completely into the debugger. * FIXME: this looks racy */ @@ -784,7 +792,7 @@ static inline int shadow_pid(int realpid if (realpid) { return realpid; } -return pid_max + smp_processor_id(); +return pid_max + raw_smp_processor_id(); } static char gdbmsgbuf[BUFMAX + 1]; @@ -848,12 +856,36 @@ int kgdb_handle_exception(int ex_vector, long kgdb_usethreadid = 0; int error = 0, all_cpus_synced = 0; struct pt_regs *shadowregs; -int processor = smp_processor_id(); +int processor = raw_smp_processor_id(); void *local_debuggerinfo; /* Panic on recursive debugger calls. */ -if (atomic_read(debugger_active) == smp_processor_id() + 1) +if (atomic_read(debugger_active) == raw_smp_processor_id() + 1) { +exception_level++; +addr = kgdb_arch_pc(ex_vector, linux_regs); +kgdb_deactivate_sw_breakpoints(); +if (kgdb_remove_sw_break(addr) == 0) { +/* If the break point removed ok at the place exception + * occurred, try to recover and print a warning to the end + * user because the user planted a breakpoint in a place + * that KGDB needs in order to function. + */ +exception_level = 0; +kgdb_skipexception(ex_vector, linux_regs); +kgdb_activate_sw_breakpoints(); +printk(KERN_CRIT KGDB: re-enter exception: breakpoint removed\n); +WARN_ON(1); +return 0; +} +remove_all_break(); +kgdb_skipexception(ex_vector, linux_regs); +if (exception_level 1) +panic(Recursive entry to debugger); + +printk(KERN_CRIT KGDB: re-enter exception: ALL breakpoints removed\n); +panic(Recursive entry to debugger); return 0; +} acquirelock: @@ -864,7 +896,7 @@ int kgdb_handle_exception(int ex_vector, local_irq_save(flags); /* Hold debugger_active */ -procid = smp_processor_id(); +procid = raw_smp_processor_id(); while (cmpxchg(atomic_read(debugger_active), 0, (procid + 1)) != 0) { int i = 25;/* an arbitrary number */ @@ -877,7 +909,7 @@ int
Re: [Kgdb-bugreport] [PATCH] attempt fix up breakpoint on reenter to KGDB
Pete/Piet Delaney wrote: Sounds great to me; avoiding spin locks() is a hassle. Ever noticed a problem with kgdb surviving a weekend of non-use? I hit a breakpoint on Saturday and hoped to continue looking at it today but the kgdb-stub, as usual, got out of phase and I had to re-do the test that caused the breakpoint; I've seen it happen many times. - -piet Do you have a gdb serial log that shows the connection was actually still alive? Depending on what you are debugging the state of timers and real-time clock could have something to do with it, when leaving the system paused for that length of time. As you probably know KGDB will busy spin the processors the whole time that KGDB is active. It is also plausible that the system's NET_POLL driver had a failure as it would certainly continue to transmit and receive response for ARP requests. I have not seen this type of failure first hand, but I am interested if there is a way to further characterize what is going on, as well as understanding how the kernel was configured and what hardware it is running on. Jason. - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ Kgdb-bugreport mailing list Kgdb-bugreport@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport
Re: [Kgdb-bugreport] [PATCH] attempt fix up breakpoint on reenter to KGDB
Sergei Shtylyov wrote: Jason Wessel wrote: This patch fixes some corner cases where KGDB will silently hang or kill the system, if a user accidentally tries to source step into a spin_unlock() call or source step in on a macro containing smp_processor_id(). The use of raw_smp_processor_id is desired in kernel/kgdb.c to fix this particular problem. Hmm, looks like it might fix the issues we also had, and even render the patch which makes KGDB dependent on !DEBUG_PREEMPT (and some others) unneeded... Which other parts did you think were not needed? It would be good to dump some of that. If this patch doesn't fix the !DEBUG_PREEMPT, perhaps the next one I am sending out about single stepping will fix it. If not can you provide me with a test case to see the problem? I had no intention of applying the !DEBUG_PREEMPT patch until I have enough information to reproduce the problem or characterization that adequately explains why it has no possibility to work. Ideally I would like to see the root cause identified so as to craft a solution. Hm, I wonder whether the override will be needed for other architectures... Sure the x86_64 arch needs to change too, and the change below should address it. I don't know that any other arch has a special rewind that is needed. Signed-off-by: Jason Wessel [EMAIL PROTECTED] Index: linux-2.6.21-standard/arch/x86_64/kernel/kgdb.c === --- linux-2.6.21-standard.orig/arch/x86_64/kernel/kgdb.c +++ linux-2.6.21-standard/arch/x86_64/kernel/kgdb.c @@ -325,6 +325,14 @@ int kgdb_skipexception(int exception, st return 0; } +unsigned long kgdb_arch_pc(int exception, struct pt_regs *regs) +{ + if (exception == 3) { + return instruction_pointer(regs) - 1; + } + return instruction_pointer(regs); +} + struct kgdb_arch arch_kgdb_ops = { .gdb_bpt_instr = {0xcc}, .flags = KGDB_HW_BREAKPOINT, - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ Kgdb-bugreport mailing list Kgdb-bugreport@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport
Re: [Kgdb-bugreport] [PATCH] attempt fix up breakpoint on reenter to KGDB
Sergei Shtylyov wrote: Hello. Jason Wessel wrote: If not can you provide me with a test case to see the problem? Well, for example, with 2.6.18-rt7 kernel, stepping into smp_processor_id() was blowing away U-Boot (!) on my PPC board (even in PREEMPT_DESKTOP mode)... On x86, stepping past preempt_disable() caused reboots. Sergei, Now that sounds more interesting. Perhaps you can elaborate on the configuration further, or provide the results of a test. I wonder if I have the same board I can try that test on? In theory it should already be fixed, after merging the code. On ppc boards I have seen this happen with evlog before where it can scramble the boot loader. For the x86 problem, which part of the kernel should I step through to try out the preempt_disable()? Are you using a low level single step, a source step in, step over, or finish frame? Certainly you cannot step through all parts of the kernel, but it would be good to understand where you cannot and why it would work by turning off DEBUG_PREEMPT. If you do come across another corner case please me know and I can try and look at it further. Jason. - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ Kgdb-bugreport mailing list Kgdb-bugreport@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport
Re: [Kgdb-bugreport] [PATCH] attempt fix up breakpoint on reenter to KGDB
Jason Wessel wrote: This patch fixes some corner cases where KGDB will silently hang or kill the system, if a user accidentally tries to source step into a spin_unlock() call or source step in on a macro containing smp_processor_id(). The use of raw_smp_processor_id is desired in kernel/kgdb.c to fix this particular problem. To fix issues with accidental source step in on spin_unlock(), the idea is to check for the existence of a break point on the second entry to kgdb and try to remove it. Next an attempt will be made to continue normal operations. A third entry will generate a panic(), so as to stop infinite loops. Testing has shown that kgdb is much more robust with these changes and random accidental run control. Signed-off-by: Jason Wessel [EMAIL PROTECTED] This patch is committed in the linux2_6_21_uprev branch across: core-lite.patch core.patch i386-lite.patch x86_64-lite.patch Jason. - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ Kgdb-bugreport mailing list Kgdb-bugreport@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport