Re: [Kgdb-bugreport] [PATCH] attempt fix up breakpoint on reenter to KGDB

2007-06-14 Thread Sergei Shtylyov
Sergei Shtylyov wrote:
If there are no objections, I will commit the following patch.

   Sorry for the belated objections. :-

Signed-off-by: Jason Wessel [EMAIL PROTECTED]

Index: linux-2.6.21.1/kernel/kgdb.c
===
--- linux-2.6.21.1.orig/kernel/kgdb.c
+++ linux-2.6.21.1/kernel/kgdb.c
[...]
@@ -848,12 +856,36 @@ int kgdb_handle_exception(int ex_vector,
long kgdb_usethreadid = 0;
int error = 0, all_cpus_synced = 0;
struct pt_regs *shadowregs;
-int processor = smp_processor_id();
+int processor = raw_smp_processor_id();
void *local_debuggerinfo;

/* Panic on recursive debugger calls. */
-if (atomic_read(debugger_active) == smp_processor_id() + 1)
+if (atomic_read(debugger_active) == raw_smp_processor_id() + 1) {
+exception_level++;
+addr = kgdb_arch_pc(ex_vector, linux_regs);
+kgdb_deactivate_sw_breakpoints();
+if (kgdb_remove_sw_break(addr) == 0) {
+/* If the break point removed ok at the place exception
+ * occurred, try to recover and print a warning to the end
+ * user because the user planted a breakpoint in a place
+ * that KGDB needs in order to function.
+ */
+exception_level = 0;
+kgdb_skipexception(ex_vector, linux_regs);
+kgdb_activate_sw_breakpoints();
+printk(KERN_CRIT KGDB: re-enter exception: breakpoint 
removed\n);
+WARN_ON(1);

And won't WARN_ON(1) in turn cause and invalid operation exception?

+return 0;
+}

WBR, Sergei

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Kgdb-bugreport mailing list
Kgdb-bugreport@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport


Re: [Kgdb-bugreport] [PATCH] attempt fix up breakpoint on reenter to KGDB

2007-06-14 Thread Sergei Shtylyov
Hello, I wrote:
If there are no objections, I will commit the following patch.

-- 

This patch fixes some corner cases where KGDB will silently hang or
kill the system, if a user accidentally tries to source step into a
spin_unlock() call or source step in on a macro containing
smp_processor_id().  The use of raw_smp_processor_id is desired in
kernel/kgdb.c to fix this particular problem.

To fix issues with accidental source step in on spin_unlock(), the
idea is to check for the existence of a break point on the second
entry to kgdb and try to remove it.  Next an attempt will be made to
continue normal operations.  A third entry will generate a panic(), so
as to stop infinite loops.

Testing has shown that kgdb is much more robust with these changes
and random accidental run control.

 It seems that something has been broken WRT backtracing...

Here's the more relevant trace -- got it at the initial KGDB breakpoint:

breakpoint () at kernel/kgdb.c:1773
1773}

(gdb) bt
#0  breakpoint () at kernel/kgdb.c:1773
#1  0x808c8811 in kgdb_late_entry () at kernel/kgdb.c:1752
#2  0x808b2a29 in init (unused=0x0) at init/main.c:692
#3  0x8020ae38 in child_rip ()
#4  0x in ?? ()
Ignoring packet error, continuing...
Reply contains invalid hex digit 116

And here's the start of endless trace on target -- it suggests that KGDb 
has caught GPF:

KGDB: re-enter exception: ALL breakpoints removed
Kernel panic - not syncing: Recursive entry to debugger

Call Trace:
  #DB  [80228ccb] panic+0xe9/0x1a9
  [80258f3a] kgdb_handle_exception+0x11c/0xf92
  [80585cad] kfree_skbmem+0x84/0x8b
  [802586f3] kgdb_mem2hex+0x311/0x35d
  [8021d433] kgdb_notify+0xfb/0x105
  [80236b36] notifier_call_chain+0x2c/0x43
  [80236d61] raw_notifier_call_chain+0x9/0xb
  [8020c3ef] do_general_protection+0xf7/0x121
  [8062f46d] error_exit+0x0/0x84
  [80258408] kgdb_mem2hex+0x26/0x35d
  [802586f3] kgdb_mem2hex+0x311/0x35d
  [80258408] kgdb_mem2hex+0x26/0x35d
  [80258910] kgdb_hex2long+0x29/0x5f
  [80259534] kgdb_handle_exception+0x716/0xf92
  [8021d433] kgdb_notify+0xfb/0x105
  [80236b36] notifier_call_chain+0x2c/0x43
  [80236d61] raw_notifier_call_chain+0x9/0xb
  [8020c621] do_int3+0x44/0xa7
  [8062f7b3] int3+0x93/0xb0
  [80259eb6] breakpoint+0xf/0x1b
  EOE  [808c8811] kgdb_late_entry+0xba/0xc3
  [808b2a29] init+0x1a2/0x37c
  [8020ae38] child_rip+0xa/0x12
  [803f8cb4] acpi_ds_init_one_object+0x0/0x88
  [808b2887] init+0x0/0x37c
  [8020ae2e] child_rip+0x0/0x12

Kernel panic - not syncing: Recursive entry to debugger

Call Trace:
  #DB[1]  [80228ccb] panic+0xe9/0x1a9
  [8021d47a] kgdb_skipexception+0x22/0x3e
  [80258f3a] kgdb_handle_exception+0x11c/0xf92
  [80259eb5] breakpoint+0xe/0x1b
  [8021d433] kgdb_notify+0xfb/0x105
  [80236b36] notifier_call_chain+0x2c/0x43
  [80236d61] raw_notifier_call_chain+0x9/0xb
  [8020c621] do_int3+0x44/0xa7
  [8062f7b3] int3+0x93/0xb0
  [80259eb6] breakpoint+0xf/0x1b
  EOE  #DB  [80259ecb] kgdb_panic_notify+0x9/0xd
  [80236b36] notifier_call_chain+0x2c/0x43
  [80236c0e] atomic_notifier_call_chain+0x34/0x4e
  [80228cee] panic+0x10c/0x1a9
  [80258f3a] kgdb_handle_exception+0x11c/0xf92
  [80585cad] kfree_skbmem+0x84/0x8b
  [802586f3] kgdb_mem2hex+0x311/0x35d
  [8021d433] kgdb_notify+0xfb/0x105
  [80236b36] notifier_call_chain+0x2c/0x43
  [80236d61] raw_notifier_call_chain+0x9/0xb
  [8020c3ef] do_general_protection+0xf7/0x121
  [8062f46d] error_exit+0x0/0x84
  [80258408] kgdb_mem2hex+0x26/0x35d
  [802586f3] kgdb_mem2hex+0x311/0x35d
  [80258408] kgdb_mem2hex+0x26/0x35d
  [80258910] kgdb_hex2long+0x29/0x5f
  [80259534] kgdb_handle_exception+0x716/0xf92
  [8021d433] kgdb_notify+0xfb/0x105
  [80236b36] notifier_call_chain+0x2c/0x43
  [80236d61] raw_notifier_call_chain+0x9/0xb
  [8020c621] do_int3+0x44/0xa7
  [8062f7b3] int3+0x93/0xb0
  [80259eb6] breakpoint+0xf/0x1b
  EOE  [808c8811] kgdb_late_entry+0xba/0xc3
  [808b2a29] init+0x1a2/0x37c
  [8020ae38] child_rip+0xa/0x12
  [803f8cb4] acpi_ds_init_one_object+0x0/0x88
  [808b2887] init+0x0/0x37c
  [8020ae2e] child_rip+0x0/0x12

Kernel panic - not syncing: Recursive entry to debugger

Call Trace:

[a lot of taceless panics skipped]

Call Trace:
  #MC  [80228ccb] panic+0xe9/0x1a9
  [8021d47a] kgdb_skipexception+0x22/0x3e
  [80258f3a] kgdb_handle_exception+0x11c/0xf92
  [80259eb5] breakpoint+0xe/0x1b
  [8021d433] kgdb_notify+0xfb/0x105
  [80236b36] 

Re: [Kgdb-bugreport] [PATCH] attempt fix up breakpoint on reenter to KGDB

2007-05-23 Thread Sergei Shtylyov
Hello.

Jason Wessel wrote:

 If not can you provide me with a test case to see the problem?

Well, for example, with 2.6.18-rt7 kernel, stepping into 
 smp_processor_id() was blowing away U-Boot (!) on my PPC board (even 
 in PREEMPT_DESKTOP mode)... On x86, stepping past preempt_disable() 
 caused reboots.

 Sergei,

 Now that sounds more interesting.  Perhaps you can elaborate on the 
 configuration further, or provide the results of a test.  I wonder if I 

It was happening in PREEMPT_DESKTOP mode when stepping into 
slab_irq_save() -- which contains cpu_processor_id() in this mode.

 have the same board I can try that test on?  In theory it should already 
 be fixed, after merging the code.  On ppc boards I have seen this happen 
 with evlog before where it can scramble the boot loader.

Looks like it's been fixed indeed. But still the session stalls in RT mode 
-- here the macro looks different...

 For the x86 problem, which part of the kernel should I step through to 
 try out the preempt_disable()?   Are you using a low level single step, 
 a source step in, step over, or finish frame?  Certainly you cannot step 
 through all parts of the kernel, but it would be good to understand 
 where you cannot and why it would work by turning off DEBUG_PREEMPT.

This also seems to work now.
Thanks for unravelling this mystery (which we were too lazy to do :-).

 Jason.

WBR, Sergei

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Kgdb-bugreport mailing list
Kgdb-bugreport@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport


Re: [Kgdb-bugreport] [PATCH] attempt fix up breakpoint on reenter to KGDB

2007-05-20 Thread Bob Picco
Sergei Shtylyov wrote:  [Sat May 19 2007, 02:54:00PM EDT]
 Jason Wessel wrote:
 
  This patch is committed in the linux2_6_21_uprev branch across:
  core-lite.patch core.patch i386-lite.patch x86_64-lite.patch
 
 BTW, what's the reason we *still* have both 
 {core|i386|ia64|powerpc|x86_64}-lite.patch and 
 core{core|i386|ia64|powerpc|x86_64}.patch? Why not just merge them?
 
  Jason.
 
 WBR, Sergei
Well I thought that core related patches weren't destined for mainline.
At least I think that was the objective in 2006. For ia64, a coworker
attempted to use Jason's git repository. The details aren't
known to me, but he wasn't successful. The ia64.patch in core would
cause significant resistance when attempting to go up stream. So please
withdraw it from inclusion in the git tree.

The patch primarily supports enabling breakpoints shortly after entering
setup_arch. It has doubtful usefulness to others.

thanks,

bob

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Kgdb-bugreport mailing list
Kgdb-bugreport@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport


Re: [Kgdb-bugreport] [PATCH] attempt fix up breakpoint on reenter to KGDB

2007-05-19 Thread Sergei Shtylyov
Jason Wessel wrote:

 This patch is committed in the linux2_6_21_uprev branch across:
 core-lite.patch core.patch i386-lite.patch x86_64-lite.patch

BTW, what's the reason we *still* have both 
{core|i386|ia64|powerpc|x86_64}-lite.patch and 
core{core|i386|ia64|powerpc|x86_64}.patch? Why not just merge them?

 Jason.

WBR, Sergei

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Kgdb-bugreport mailing list
Kgdb-bugreport@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport


Re: [Kgdb-bugreport] [PATCH] attempt fix up breakpoint on reenter to KGDB

2007-05-15 Thread Pete/Piet Delaney
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Jason Wessel wrote:
 If there are no objections, I will commit the following patch.

Sounds great to me; avoiding spin locks() is a hassle.

Ever noticed a problem with kgdb surviving a weekend of non-use?

I hit a breakpoint on Saturday and hoped to continue looking at it
today but the kgdb-stub, as usual, got out of phase and I had
to re-do the test that caused the breakpoint; I've seen it happen
many times.

- -piet

 
 --
 
 This patch fixes some corner cases where KGDB will silently hang or
 kill the system, if a user accidentally tries to source step into a
 spin_unlock() call or source step in on a macro containing
 smp_processor_id().  The use of raw_smp_processor_id is desired in
 kernel/kgdb.c to fix this particular problem.
 
 To fix issues with accidental source step in on spin_unlock(), the
 idea is to check for the existence of a break point on the second
 entry to kgdb and try to remove it.  Next an attempt will be made to
 continue normal operations.  A third entry will generate a panic(), so
 as to stop infinite loops.
 
 Testing has shown that kgdb is much more robust with these changes
 and random accidental run control.
 
 Signed-off-by: Jason Wessel [EMAIL PROTECTED]
 
 ---
  arch/i386/kernel/kgdb.c |8 
  kernel/kgdb.c   |   47 
 ---
  2 files changed, 48 insertions(+), 7 deletions(-)
 
 Index: linux-2.6.21.1/kernel/kgdb.c
 ===
 --- linux-2.6.21.1.orig/kernel/kgdb.c
 +++ linux-2.6.21.1/kernel/kgdb.c
 @@ -70,6 +70,8 @@ int kgdb_connected;
  int kgdb_may_fault;
  /* All the KGDB handlers are installed */
  int kgdb_from_module_registered = 0;
 +/* Guard for recursive entry */
 +static int exception_level = 0;
  
  /* We provide a kgdb_io_ops structure that may be overriden. */
  struct kgdb_io __attribute__ ((weak)) kgdb_io_ops;
 @@ -166,6 +168,12 @@ int __attribute__ ((weak))
  return 0;
  }
  
 +unsigned long __attribute__ ((weak))
 +kgdb_arch_pc(int exception, struct pt_regs *regs)
 +{
 +return instruction_pointer(regs);
 +}
 +
  static int hex(char ch)
  {
  if ((ch = 'a')  (ch = 'f'))
 @@ -580,11 +588,11 @@ static void kgdb_wait(struct pt_regs *re
  int processor;
  
  local_irq_save(flags);
 -processor = smp_processor_id();
 +processor = raw_smp_processor_id();
  kgdb_info[processor].debuggerinfo = regs;
  kgdb_info[processor].task = current;
  atomic_set(procindebug[processor], 1);
 -atomic_set(kgdb_sync_softlockup[smp_processor_id()], 1);
 +atomic_set(kgdb_sync_softlockup[raw_smp_processor_id()], 1);
  
  /* Wait till master processor goes completely into the debugger.
   * FIXME: this looks racy */
 @@ -784,7 +792,7 @@ static inline int shadow_pid(int realpid
  if (realpid) {
  return realpid;
  }
 -return pid_max + smp_processor_id();
 +return pid_max + raw_smp_processor_id();
  }
  
  static char gdbmsgbuf[BUFMAX + 1];
 @@ -848,12 +856,36 @@ int kgdb_handle_exception(int ex_vector,
  long kgdb_usethreadid = 0;
  int error = 0, all_cpus_synced = 0;
  struct pt_regs *shadowregs;
 -int processor = smp_processor_id();
 +int processor = raw_smp_processor_id();
  void *local_debuggerinfo;
  
  /* Panic on recursive debugger calls. */
 -if (atomic_read(debugger_active) == smp_processor_id() + 1)
 +if (atomic_read(debugger_active) == raw_smp_processor_id() + 1) {
 +exception_level++;
 +addr = kgdb_arch_pc(ex_vector, linux_regs);
 +kgdb_deactivate_sw_breakpoints();
 +if (kgdb_remove_sw_break(addr) == 0) {
 +/* If the break point removed ok at the place exception
 + * occurred, try to recover and print a warning to the end
 + * user because the user planted a breakpoint in a place
 + * that KGDB needs in order to function.
 + */
 +exception_level = 0;
 +kgdb_skipexception(ex_vector, linux_regs);
 +kgdb_activate_sw_breakpoints();
 +printk(KERN_CRIT KGDB: re-enter exception: breakpoint 
 removed\n);
 +WARN_ON(1);
 +return 0;
 +}
 +remove_all_break();
 +kgdb_skipexception(ex_vector, linux_regs);
 +if (exception_level  1)
 +panic(Recursive entry to debugger);
 +
 +printk(KERN_CRIT KGDB: re-enter exception: ALL breakpoints 
 removed\n);
 +panic(Recursive entry to debugger);
  return 0;
 +}
  
   acquirelock:
  
 @@ -864,7 +896,7 @@ int kgdb_handle_exception(int ex_vector,
  local_irq_save(flags);
  
  /* Hold debugger_active */
 -procid = smp_processor_id();
 +procid = raw_smp_processor_id();
  
  while (cmpxchg(atomic_read(debugger_active), 0, (procid + 1)) != 0) {
  int i = 25;/* an arbitrary number */
 @@ -877,7 +909,7 @@ int 

Re: [Kgdb-bugreport] [PATCH] attempt fix up breakpoint on reenter to KGDB

2007-05-15 Thread Jason Wessel
Pete/Piet Delaney wrote:
 Sounds great to me; avoiding spin locks() is a hassle.

 Ever noticed a problem with kgdb surviving a weekend of non-use?

 I hit a breakpoint on Saturday and hoped to continue looking at it
 today but the kgdb-stub, as usual, got out of phase and I had
 to re-do the test that caused the breakpoint; I've seen it happen
 many times.

 - -piet
   

Do you have a gdb serial log that shows the connection was actually 
still alive?

Depending on what you are debugging the state of timers and real-time 
clock could have something to do with it, when leaving the system paused 
for that length of time.  As you probably know KGDB will busy spin the 
processors the whole time that KGDB is active.  It is also plausible 
that the system's NET_POLL driver had a failure as it would certainly 
continue to transmit and receive response for ARP requests.

I have not seen this type of failure first hand, but I am interested if 
there is a way to further characterize what is going on, as well as 
understanding how the kernel was configured and what hardware it is 
running on.

Jason.

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Kgdb-bugreport mailing list
Kgdb-bugreport@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport


Re: [Kgdb-bugreport] [PATCH] attempt fix up breakpoint on reenter to KGDB

2007-05-15 Thread Jason Wessel
Sergei Shtylyov wrote:
 Jason Wessel wrote:

 This patch fixes some corner cases where KGDB will silently hang or
 kill the system, if a user accidentally tries to source step into a
 spin_unlock() call or source step in on a macro containing
 smp_processor_id().  The use of raw_smp_processor_id is desired in
 kernel/kgdb.c to fix this particular problem.

Hmm, looks like it might fix the issues we also had, and even 
 render the patch which makes KGDB dependent on !DEBUG_PREEMPT (and 
 some others) unneeded...


Which other parts did you think were not needed?  It would be good to 
dump some of that.   If this patch doesn't fix the !DEBUG_PREEMPT, 
perhaps the next one I am sending out about single stepping will fix 
it.   If not can you provide me with a test case to see the problem?

I had no intention of applying the !DEBUG_PREEMPT patch until I have 
enough information to reproduce the problem or characterization that 
adequately explains why it has no possibility to work.  Ideally I would 
like to see the root cause identified so as to craft a solution.

Hm, I wonder whether the override will be needed for other 
 architectures...


Sure the x86_64 arch needs to change too, and the change below should 
address it.  I don't know that any other arch has a special rewind that 
is needed.

Signed-off-by: Jason Wessel [EMAIL PROTECTED]


Index: linux-2.6.21-standard/arch/x86_64/kernel/kgdb.c
===
--- linux-2.6.21-standard.orig/arch/x86_64/kernel/kgdb.c
+++ linux-2.6.21-standard/arch/x86_64/kernel/kgdb.c
@@ -325,6 +325,14 @@ int kgdb_skipexception(int exception, st
return 0;
 }
 
+unsigned long kgdb_arch_pc(int exception, struct pt_regs *regs)
+{
+   if (exception == 3) {
+   return instruction_pointer(regs) - 1;
+   }
+   return instruction_pointer(regs);
+}
+
 struct kgdb_arch arch_kgdb_ops = {
.gdb_bpt_instr = {0xcc},
.flags = KGDB_HW_BREAKPOINT,





-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Kgdb-bugreport mailing list
Kgdb-bugreport@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport


Re: [Kgdb-bugreport] [PATCH] attempt fix up breakpoint on reenter to KGDB

2007-05-15 Thread Jason Wessel
Sergei Shtylyov wrote:
 Hello.

 Jason Wessel wrote:
 If not can you provide me with a test case to see the problem?

Well, for example, with 2.6.18-rt7 kernel, stepping into 
 smp_processor_id() was blowing away U-Boot (!) on my PPC board (even 
 in PREEMPT_DESKTOP mode)... On x86, stepping past preempt_disable() 
 caused reboots.

Sergei,

Now that sounds more interesting.  Perhaps you can elaborate on the 
configuration further, or provide the results of a test.  I wonder if I 
have the same board I can try that test on?  In theory it should already 
be fixed, after merging the code.  On ppc boards I have seen this happen 
with evlog before where it can scramble the boot loader.

For the x86 problem, which part of the kernel should I step through to 
try out the preempt_disable()?   Are you using a low level single step, 
a source step in, step over, or finish frame?  Certainly you cannot step 
through all parts of the kernel, but it would be good to understand 
where you cannot and why it would work by turning off DEBUG_PREEMPT.

If you do come across another corner case please me know and I can try 
and look at it further.

Jason.



-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Kgdb-bugreport mailing list
Kgdb-bugreport@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport


Re: [Kgdb-bugreport] [PATCH] attempt fix up breakpoint on reenter to KGDB

2007-05-15 Thread Jason Wessel
Jason Wessel wrote:
 This patch fixes some corner cases where KGDB will silently hang or
 kill the system, if a user accidentally tries to source step into a
 spin_unlock() call or source step in on a macro containing
 smp_processor_id().  The use of raw_smp_processor_id is desired in
 kernel/kgdb.c to fix this particular problem.

 To fix issues with accidental source step in on spin_unlock(), the
 idea is to check for the existence of a break point on the second
 entry to kgdb and try to remove it.  Next an attempt will be made to
 continue normal operations.  A third entry will generate a panic(), so
 as to stop infinite loops.

 Testing has shown that kgdb is much more robust with these changes
 and random accidental run control.

 Signed-off-by: Jason Wessel [EMAIL PROTECTED]

This patch is committed in the linux2_6_21_uprev branch across:
core-lite.patch core.patch i386-lite.patch x86_64-lite.patch

Jason.

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Kgdb-bugreport mailing list
Kgdb-bugreport@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport