Folks, I've found a problem in the Linux 2.2 series kernels related to watchpoint handling which affects gdb. I mailed a patch to the kernel mail list, but as far as I can tell it no one read it. So, what I'm really looking for is someone on this list who can attract the attention of a relevant kernel developer (Alan Cox ?). The problem is that if you set a watchpoint on some data and then write over it from inside the kernel (by doing a read onto it, for instance), not only is that missed, but also (and WORSE) the watchpoint is completely disabled until the process gets rescheduled. Here's a test case and sample gdb session demonstrating the bug... /* Watchpoint test program. */ #include <stdio.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> /**********************************************************************/ /* Reads zeroes into a buf. */ void read_zero_into (void *buf, int len) { int fd = open ("/dev/zero", O_RDONLY); read (fd, buf, len); close (fd); } void nop () { } /**********************************************************************/ /* Globals reference tests. */ volatile int gi = -1; /* Ensure writes happen by tagging it volatile */ void globals_test() { nop(); gi = 26; nop(); read_zero_into ((void *)&gi, sizeof(gi)); /* Hit the watchpoint from the kernel */ nop(); gi = 41; /* Should stop */ nop(); } int main() { globals_test(); exit (0); } ------- % uname -a Linux pc2 2.2.16 #4 Tue Jun 13 13:36:26 BST 2000 i686 unknown % gcc -g -o tst_watch tst_watch.c % gdb tst_watch GNU gdb 4.17.0.11 with Linux support Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux"... (gdb) watch gi Hardware watchpoint 1: gi (gdb) run Starting program: /home1/jim/tmp/tst_watch Hardware watchpoint 1: gi Hardware watchpoint 1: gi Hardware watchpoint 1: gi Old value = -1 New value = 26 globals_test () at tst_watch.c:31 31 nop(); (gdb) cont Continuing. Program exited normally. *** Note that the watchpoint hit on line 37 was missed (by the kernel) (gdb) run Starting program: /home1/jim/tmp/tst_watch Hardware watchpoint 1: gi Old value = -1 New value = 26 globals_test () at tst_watch.c:31 31 nop(); (gdb) list 26 27 void globals_test() 28 { 29 nop(); 30 gi = 26; 31 nop(); 32 33 read_zero_into ((void *)&gi, sizeof(gi)); /* Hit the watchpoint from the kernel */ 34 35 nop(); (gdb) break 35 Breakpoint 2 at 0x80484fc: file tst_watch.c, line 35. (gdb) cont Continuing. Hardware watchpoint 1: gi Old value = 26 New value = 0 Breakpoint 2, globals_test () at tst_watch.c:35 35 nop(); (gdb) cont Continuing. Hardware watchpoint 1: gi Old value = 0 New value = 41 globals_test () at tst_watch.c:37 37 nop(); (gdb) quit % *** That time we did hit the watchpoint on line 37 because we had forced a *** reschedule by stopping at the intervening breakpoint... Below are the mails I sent to the kernel list, one of which includes a suggested patch. Enjoy. -- Jim James Cownie <[EMAIL PROTECTED]> Etnus, LLC. +44 117 9071438 http://www.etnus.com To: [EMAIL PROTECTED] Subject: Watchpoint problems in 2.2 and later series x86 kernels Reply-To: James Cownie <[EMAIL PROTECTED]> Date: Wed, 21 Jun 2000 12:57:50 +0100 From: James Cownie <jcownie@pc2> Folks, Although we've now fixed one problem (the failure to store %%db6 into the tss' version of the registers where ptrace could see it), there remains at least one other problem with the x86 debug register handling. Specifically, the code in (arch/i386/kernel/traps.c) do_debug clears %%db7 if a watchpoint trap is taken in kernel mode. (For instance as the result of "read" writing over a watched area of store). There is then _no_ code which restores %%db7 until the task is next rescheduled. The effect is that all watchpoints in the process are disabled for (at least) the rest of the process' time-slice if the interrupted system call does not deschedule the process. There are two related points here 1) It would be very good to have the data watch triggered from inside the kernel reported to the user (or more likely the user's debugger), since if you're trying to find out what is stomping one of your variables you expect setting a watchpoint on it to tell you, whether or not the stomping is happening as a result of passing bad arguments to a system call. That means that we should be sending the signal, even if we then zero %%dr7. 2) Whether or not we send the signal, we still need to ensure that the value of %%dr7 reflects that from the task's tss when we get back to user code, so that later watchpoint traps are not ignored. Comments, suggestions ? -- Jim James Cownie <[EMAIL PROTECTED]> Etnus, Inc. +44 117 9071438 http://www.etnus.com To: [EMAIL PROTECTED] Subject: Re: Watchpoint problems in 2.2 and later series x86 kernels (PATCH) Reply-To: James Cownie <[EMAIL PROTECTED]> Date: Fri, 23 Jun 2000 15:00:17 +0100 From: James Cownie <jcownie@pc2> Here is a patch against 2.2.16 which ensures that watchpoints in user code are not ignored. Rather than restoring dr7 on every return to user space this simply ignores the watchpoint debug trap while in kernel mode but does _not_ clear dr7. This means that we may see multiple traps if the kernel is accessing a watchpointed area a lot, and that will be slow. However, this will only happen to processes which have watchpoints enabled, and having them run slowly seems preferable to penalising _all_ system calls even when no watchpoints are enabled anywhere, which is the alternative. We will still not report to the user a system call overwriting watched data, which is unpleasant. We should report that. However that seems hard and potentially expensive. At least this patch fixes the problem that watchpoints in user code are disabled until the process is descheduled if the kernel had hit one of them. Now all user code will see watchpoints, while kernel code will skip them. This is, at least, explicable behaviour, even if not desirable. Enjoy. -- Jim James Cownie <[EMAIL PROTECTED]> Etnus, Inc. +44 117 9071438 http://www.etnus.com begin 644 watchpoint-patch M"BHJ*B!T<F%P<RYC+C(N,BXQ-@E7960@2G5N("`W(#(R.C(V.C0R(#(P,#`* M+2TM('1R87!S+F,)1G)I($IU;B`R,R`Q-#HS-3HQ,"`R,#`P"BHJ*BHJ*BHJ M*BHJ*BHJ*@HJ*BH@,S8W+#,W,B`J*BHJ"BTM+2`S-C<L,S<U("TM+2T*("`* M("`)7U]A<VU?7R!?7W9O;&%T:6QE7U\H(FUO=FP@)25D8C8L)3`B(#H@(CUR M(B`H8V]N9&ET:6]N*2D["B`@"BL@"2\J($5N<W5R92!T:&4@9&5B=6<@<W1A M='5S(')E9VES=&5R(&ES('9I<VEB;&4@=&\@<'1R86-E("AO<B!T:&4@<')O M8V5S<R!I='-E;&8I("HO"BL@"71S:RT^='-S+F1E8G5G<F5G6S9=(#T@8V]N M9&ET:6]N.PHK(`H@(`DO*B!-87-K(&]U="!S<'5R:6]U<R!41B!E<G)O<G,@ M9'5E('1O(&QA>GD@5$8@8VQE87)I;F<@*B\*("`):68@*&-O;F1I=&EO;B`F M($127U-415`I('L*("`)"2\J"BHJ*BHJ*BHJ*BHJ*BHJ*@HJ*BH@,S@R+#,Y M-B`J*BHJ"B`@"0D)9V]T;R!C;&5A<E]41CL*("`)?0H@(`HA(`DO*B!-87-T M(&]U="!S<'5R:6]U<R!D96)U9R!T<F%P<R!D=64@=&\@;&%Z>2!$4C<@<V5T M=&EN9R`J+PH@(`EI9B`H8V]N9&ET:6]N("8@*$127U1205`P?$127U1205`Q M?$127U1205`R?$127U1205`S*2D@>PH@(`D):68@*"%T<VLM/G1S<RYD96)U M9W)E9ULW72D*("`)"0EG;W1O(&-L96%R7V1R-SL*("`)?0H@(`HA(`DO*B!) M9B!T:&ES(&ES(&$@:V5R;F5L(&UO9&4@=')A<"P@=V4@;F5E9"!T;R!R97-E M="!D8C<@=&\@86QL;W<@=7,@=&\@8V]N=&EN=64@<V%N96QY("HO"B$@"6EF M("@H<F5G<RT^>&-S("8@,RD@/3T@,"D*(2`)"6=O=&\@8VQE87)?9'(W.PH@ M(`H@(`DO*B!/:RP@9FEN86QL>2!S;VUE=&AI;F<@=V4@8V%N(&AA;F1L92`J M+PH@(`ET<VLM/G1S<RYT<F%P7VYO(#T@,3L*+2TM(#,X-2PT,3`@+2TM+0H@ M(`D)"6=O=&\@8VQE87)?5$8["B`@"7T*("`*(2`)+RH@36%S:R!O=70@<W!U M<FEO=7,@9&5B=6<@=')A<',@9'5E('1O(&QA>GD@1%(W('-E='1I;F<@*B\* M("`):68@*&-O;F1I=&EO;B`F("A$4E]44D%0,'Q$4E]44D%0,7Q$4E]44D%0 M,GQ$4E]44D%0,RDI('L*("`)"6EF("@A='-K+3YT<W,N9&5B=6=R96=;-UTI M"B`@"0D)9V]T;R!C;&5A<E]D<C<["B`@"7T*("`*(2`)+RH@268@=&AI<R!I M<R!A(&ME<FYE;"!M;V1E('1R87`L('=E(&EG;F]R92!I="X@4VEN8V4@=&AE M(&1E8G5G"B$@"2`J(&5X8V5P=&EO;B!I<R!A('1R87`@=&AE(&EN<W1R=6-T M:6]N(&AA<R!C;VUP;&5T960L(&%N9"!W92!C86X@:G5S="`*(2`)("H@<F5S M=6UE(&%T('1H92!N97AT(&EN<W1R=6-T:6]N+@HA(`D@*B!)9B!W92!W97)E M('1O(&-L96%R('1H92!D96)U9R!R96=I<W1E<BP@=&AE;B!W92!W;W5L9"!N M965D('1O(')E<W1O<F4*(2`)("H@:70@:6X@979E<GD@<F5T=7)N('1O('5S M97(@;&5V96P@*&]R(&UI<W,@=7-E<B!L979E;"!W871C:'!O:6YT<RDN(`HA M(`D@*B!)9B!W92!J=7-T(&QE879E(&ET(&]N(&AE<F4@=V4@(&UA>2!T86ME M(&$@;G5M8F5R(&]F('1R87!S(&EN('1H92!K97)N96PL(`HA(`D@*B!B=70@ M;VYL>2!F;W(@82!P<F]C97-S('=H:6-H(&AA<R!W871C:'!O:6YT<R!E;F%B M;&5D+"!A;F0@=&AA="!C86X@"B$@"2`J(')E87-O;F%B;'D@<&%Y('-I;F-E M(&ET(&ES(&$@<F%R92!O8V-U<G)E;F-E+@HA(`D@*B!)="!W;W5L9"!B92!G M;V]D('1O(&)E(&%B;&4@=&\@<VEG;F%L('1H92!P<F]C97-S+"!S:6YC92!C M;VUP;&5T96QY"B$@"2`J(&EG;F]R:6YG('1H92!T<F%P(&UE86YS('1H870@ M=V%T8VAP;VEN=&5D(&1A=&$@8V%N(&)E('-I;&5N=&QY(&]V97)W<FET=&5N M"B$@"2`J(&)Y(&$@<WES=&5M(&-A;&PL(&AO=V5V97(@=&AA="!S965M<R!H M87)D+@HA(`D@*B\*(2`):68@*"AR96=S+3YX8W,@)B`S*2`]/2`P*2`*(2`) M("`@("`@("!R971U<FX["B`@"B`@"2\J($]K+"!F:6YA;&QY('-O;65T:&EN L9R!W92!C86X@:&%N9&QE("HO"B`@"71S:RT^='-S+G1R87!?;F\@/2`Q.PH` ` end