On Sat, Mar 22, 2003 at 12:02:54PM +1100, Keith Owens wrote:
> On Fri, 21 Mar 2003 16:35:02 -0500 (EST), 
> Hua Qin <[EMAIL PROTECTED]> wrote:
> >I downloaded 2.4.20 kernel from kernel.org, and patch
> >kdb-v3.0-2.4.10-common and kdb-v3.0-2.4.20-i386 into the kernel.
> >Then, I put this  kernel (compiled by gcc-.2.96)  to a SMP machine. set a
> >break point by using bp,
> >after the machine hit the break point, I use bt and cpu commands to check
> >the status. Then I type bc 0 to clear that break point, and go to trying
> >to leave kdb. But I never can go out, it always hit "Oops int3 ". I think
> >it is the problem only happen on SMP machine, because another UP (unit
> >processor) never has this problem.
> >
> >Today, I try v4.0 and other two patches kdb-smphdr-v4.0, same problem
> >happened.
> >
> >Doese someone has some solutions about this.
> 
> Known race condition with no solution yet.  man Documentation/kdb/kdb_bp.man
> and look for 'SMP'.
> 
> 
Here is a simple way to handle this situation. Before deciding that 
this trap is not due to one of KDB's breakpoints, check if this
is a breakpoint that was recently removed. You can do this by
looking for the INT3 instruction at the trap location and if it
is not found, we can conclude that this is a recently removed
breakpoint. I use a similar technique in dynamic probes too.

Comments?

Hua: can you give this a try and see if it helps?
-- 
Vamsi Krishna S.
IBM Software Lab, Bangalore.
Ph: +91 80 5044959
Internet: [EMAIL PROTECTED]
--
diff -urN -X /home/vamsi/.dontdiff 2420-kdb4.0-pure/arch/i386/kdb/kdba_bp.c 
2420-kdb4.0/arch/i386/kdb/kdba_bp.c
--- 2420-kdb4.0-pure/arch/i386/kdb/kdba_bp.c    2003-03-31 14:32:47.000000000 +0530
+++ 2420-kdb4.0/arch/i386/kdb/kdba_bp.c 2003-03-31 14:27:50.000000000 +0530
@@ -294,13 +294,9 @@
  *     If multiple processors receive debug exceptions simultaneously,
  *     one may be waiting at the kdb fence in kdb() while the user
  *     issues a 'bc' command to clear the breakpoint the processor which
- *     is waiting has already encountered.   If this is the case, the
- *     debug registers will no longer match any entry in the breakpoint
- *     table, and we'll return the value '3'.  This can cause a panic
- *     in die_if_kernel().  It is safer to disable the breakpoint (bd),
- *     'go' until all processors are past the breakpoint then clear the
- *     breakpoint (bc).  This code recognises a breakpoint even when
- *     disabled but not when it has been cleared.
+ *     is waiting has already encountered.   To handle this case,
+ *     check if we are here due to a breakpoint that is recently
+ *     removed and if so, simply ignore this hit.
  *
  *     WARNING: This routine resets the eip.  It should be called
  *              once per breakpoint and the result cached.
@@ -312,6 +308,7 @@
        int i;
        kdb_dbtrap_t rv;
        kdb_bp_t *bp;
+       unsigned char opcode;
 
        if (KDB_NULL_REGS(regs))
                return KDB_DB_NOBPT;
@@ -350,6 +347,21 @@
                }
        }
 
+       /*
+        * This doesn't seem to be one of our breakpoints. Before deciding 
+        * that this is one of ours, make sure that this is not one of our 
+        * breakpoints that was recently removed. If the breakpoint 
+        * instruction is not present at the trap location, we assume it 
+        * is our breakpoint that was recently removed and consider it 
+        * handled (return !0 from kdb()) so that we don't cause any 
+        * unhandled int3's in kernel which results in kernel panic.
+        */
+       if (!kdb_getword(&opcode, regs->eip, 1)) {
+               if (opcode != IA32_BREAKPOINT_INSTRUCTION) {
+                       rv = KDB_DB_REMOVED_BPT;
+               }
+       }       
+
        return rv;
 }
 
diff -urN -X /home/vamsi/.dontdiff 2420-kdb4.0-pure/Documentation/kdb/kdb_bp.man 
2420-kdb4.0/Documentation/kdb/kdb_bp.man
--- 2420-kdb4.0-pure/Documentation/kdb/kdb_bp.man       2003-03-31 14:32:42.000000000 
+0530
+++ 2420-kdb4.0/Documentation/kdb/kdb_bp.man    2003-03-31 14:28:57.000000000 +0530
@@ -121,23 +121,6 @@
 The breakpoint subsystem does not currently use any environment
 variables.
 .SH SMP CONSIDERATIONS
-Using
-.B bc
-is risky on SMP systems.
-If you clear a breakpoint when another cpu has hit that breakpoint but
-has not been processed then it may not be recognised as a kdb
-breakpoint, usually resulting in incorrect program counters and kernel
-panics.
-It is safer to disable the breakpoint with
-.BR bd ,
-then
-.B go
-to let any other processors that are waiting on the breakpoint to
-clear.
-After all processors are clear of the disabled breakpoint then it is
-safe to clear it using
-.BR bc .
-.P
 Breakpoints which use the processor breakpoint registers
 are only established on the processor which is
 currently active.  If you wish breakpoints to be universal
diff -urN -X /home/vamsi/.dontdiff 2420-kdb4.0-pure/include/linux/kdbprivate.h 
2420-kdb4.0/include/linux/kdbprivate.h
--- 2420-kdb4.0-pure/include/linux/kdbprivate.h 2003-03-31 14:32:41.000000000 +0530
+++ 2420-kdb4.0/include/linux/kdbprivate.h      2003-03-31 14:14:40.000000000 +0530
@@ -265,7 +265,8 @@
        KDB_DB_SS,      /* Single-step trap */
        KDB_DB_SSB,     /* Single step to branch */
        KDB_DB_SSBPT,   /* Single step over breakpoint */
-       KDB_DB_NOBPT    /* Spurious breakpoint */
+       KDB_DB_NOBPT,   /* Spurious breakpoint */
+       KDB_DB_REMOVED_BPT      /* Recently removed breakpoint */
 } kdb_dbtrap_t;
 
 extern kdb_dbtrap_t kdba_db_trap(struct pt_regs *, int);       /* DEBUG trap/fault 
handler */
diff -urN -X /home/vamsi/.dontdiff 2420-kdb4.0-pure/kdb/kdbmain.c 
2420-kdb4.0/kdb/kdbmain.c
--- 2420-kdb4.0-pure/kdb/kdbmain.c      2003-03-31 14:32:41.000000000 +0530
+++ 2420-kdb4.0/kdb/kdbmain.c   2003-03-31 14:13:42.000000000 +0530
@@ -1435,10 +1435,14 @@
                db_result = kdba_db_trap(regs, error);  /* Only call this once */
        }
 
-       if ((reason == KDB_REASON_BREAK || reason == KDB_REASON_DEBUG)
-        && db_result == KDB_DB_NOBPT) {
-               KDB_DEBUG_STATE("kdb 2", reason);
-               return 0;       /* Not one of mine */
+       if ((reason == KDB_REASON_BREAK || reason == KDB_REASON_DEBUG)) {
+               if (db_result == KDB_DB_NOBPT) {
+                       KDB_DEBUG_STATE("kdb 2", reason);
+                       return 0;       /* Not one of mine */
+               } else if (db_result == KDB_DB_REMOVED_BPT) {
+                       KDB_DEBUG_STATE("kdb 2a", reason);
+                       return 1;
+               }
        }
 
        /* Turn off single step if it was being used */

Reply via email to