Bad CC0 optimizer in VAX backend (was Re: Patches to fix GCC’s C++ exception handling on NetBSD/VAX)

Jake Hamby Mon, 28 Mar 2016 15:30:07 -0700

I have some bad news and some good news. The bad news is that there has been a 
nasty optimizer bug lurking in the VAX backend for GCC for many years, which 
has to do with over-optimistic removal of necessary tst/cmp instructions under 
certain circumstances. This manifests at -O or higher and the symptoms are 
strange behavior like stack overflows because of branches going the wrong way.


The good news is that my suspicions about the CC0 handler appear to be correct, 
and better yet, I'm currently testing a patch that isn't too big and I believe 
will fix this bug. The bad behavior is in vax_notice_update_cc (), which is 
supposed to call CC_STATUS_INIT if the CC flags have been reset, or set 
cc_status.value1 and possibly cc_status.value2 with the destination of the 
current (set...) instruction, whose signed comparison to 0 will be stored in 
the N and Z flags as a side effect (most VAX instructions do this).

The first bug is that most, but not all of the define_insn patterns in vax.md 
actually do this. The paths that emit movz* instructions will not set N and Z, 
and there are some code paths that can't be trusted because they handle a 
64-bit integer in two 32-bit pieces, for example, so N and Z won't reflect the 
desired result (comparison of operand 0 to 0).

The current version of vax_notice_update_cc has a specific check for this: it 
won't set the cc_status cache if GET_CODE (SET_DEST (exp)) != ZERO_EXTRACT. The 
problem is that this is checking the RTL expression for (zero_extract...) and 
not whether what was actually emitted is a movz* or not. That's why all of the 
define_insn()'s have to be annotated with their actual behavior vis-a-vis 
compare to 0, and then the change to vax.c to read the CC attribute makes it 
really much easier to get the correct behavior.

I need to do a lot of testing today to see if this really does help fix GCC's 
VAX backend, but I'm hopeful that this is at least a real bug that needs to be 
fixed, and that I have a fix for it that should work. The other good news is 
that you only need three cc enums: "none" (doesn't alter Z or N flags), 
"compare" (Z and N reflect comparison of operand 0 to literal 0), and "clobber" 
(call CC_STATUS_INIT to reset cc_status cache).

One thing to note is that the current version of vax.c only sets CC_NO_OVERFLOW 
on certain paths, while it should actually always set CC_NO_OVERFLOW so as to 
avoid emitting any unsigned branch instructions that would use an incorrect 
value of the C flag (which most VAX instructions do NOT set). The code that 
handles CC_NO_OVERFLOW is in final.c, and what it does is change signed 
comparisons with 0 into unsigned ones that do the same thing. (a < 0) is always 
false (for unsigned ints), (a >= 0) is always true, (a <= 0) turns into (a == 
0), and (a > 0) turns into (a != 0). Here's the patch to vax.c. I'll send the 
rest after I've tested that it does what I think it should do. The diff to 
vax.md will be larger, but it's mostly adding the "cc" attributes and not code.

Index: gcc/config/vax/vax.c
===================================================================
RCS file: /cvsroot/src/external/gpl3/gcc/dist/gcc/config/vax/vax.c,v
retrieving revision 1.15
diff -u -r1.15 vax.c
--- gcc/config/vax/vax.c        24 Mar 2016 04:27:29 -0000      1.15
+++ gcc/config/vax/vax.c        28 Mar 2016 22:28:10 -0000
@@ -1131,61 +1136,51 @@
 /* Worker function for NOTICE_UPDATE_CC.  */
 
 void
-vax_notice_update_cc (rtx exp, rtx insn ATTRIBUTE_UNUSED)
+vax_notice_update_cc (rtx exp, rtx_insn *insn)
 {
+  attr_cc cc = get_attr_cc (insn);
+
   if (GET_CODE (exp) == SET)
     {
       if (GET_CODE (SET_SRC (exp)) == CALL)
        CC_STATUS_INIT;
-      else if (GET_CODE (SET_DEST (exp)) != ZERO_EXTRACT
-              && GET_CODE (SET_DEST (exp)) != PC)
+      else if (GET_CODE (SET_DEST (exp)) != PC
+              && cc == CC_COMPARE)
        {
-         cc_status.flags = 0;
-         /* The integer operations below don't set carry or
+         /* Some of the integer operations don't set carry or
             set it in an incompatible way.  That's ok though
             as the Z bit is all we need when doing unsigned
             comparisons on the result of these insns (since
             they're always with 0).  Set CC_NO_OVERFLOW to
             generate the correct unsigned branches.  */
-         switch (GET_CODE (SET_SRC (exp)))
-           {
-           case NEG:
-             if (GET_MODE_CLASS (GET_MODE (exp)) == MODE_FLOAT)
-               break;
-           case AND:
-           case IOR:
-           case XOR:
-           case NOT:
-           case CTZ:
-           case MEM:
-           case REG:
-             cc_status.flags = CC_NO_OVERFLOW;
-             break;
-           default:
-             break;
-           }
+         cc_status.flags = CC_NO_OVERFLOW;
          cc_status.value1 = SET_DEST (exp);
          cc_status.value2 = SET_SRC (exp);
        }
+      else if (cc != CC_NONE)
+       CC_STATUS_INIT;
     }
   else if (GET_CODE (exp) == PARALLEL
           && GET_CODE (XVECEXP (exp, 0, 0)) == SET)
     {
-      if (GET_CODE (SET_SRC (XVECEXP (exp, 0, 0))) == CALL)
+      rtx exp0 = XVECEXP (exp, 0, 0);
+      if (GET_CODE (SET_SRC (exp0)) == CALL)
        CC_STATUS_INIT;
-      else if (GET_CODE (SET_DEST (XVECEXP (exp, 0, 0))) != PC)
+      else if (GET_CODE (SET_DEST (exp0)) != PC
+              && cc == CC_COMPARE)
        {
-         cc_status.flags = 0;
-         cc_status.value1 = SET_DEST (XVECEXP (exp, 0, 0));
-         cc_status.value2 = SET_SRC (XVECEXP (exp, 0, 0));
+         cc_status.flags = CC_NO_OVERFLOW;
+         cc_status.value1 = SET_DEST (exp0);
+         cc_status.value2 = SET_SRC (exp0);
        }
-      else
+      else if (cc != CC_NONE)
        /* PARALLELs whose first element sets the PC are aob,
           sob insns.  They do change the cc's.  */
        CC_STATUS_INIT;
     }
-  else
+  else if (cc != CC_NONE)
     CC_STATUS_INIT;
+
   if (cc_status.value1 && REG_P (cc_status.value1)
       && cc_status.value2
       && reg_overlap_mentioned_p (cc_status.value1, cc_status.value2))
@@ -1909,12 +1904,20 @@
     return true;
   if (indirectable_address_p (x, strict, false))
     return true;
-  xfoo0 = XEXP (x, 0);
-  if (MEM_P (x) && indirectable_address_p (xfoo0, strict, true))
-    return true;
-  if ((GET_CODE (x) == PRE_DEC || GET_CODE (x) == POST_INC)
-      && BASE_REGISTER_P (xfoo0, strict))
-    return true;
+  /* Note: avoid calling XEXP until needed.  It may not be a valid type.
+     This fixes an assertion failure when RTX checking is enabled.  */
+  if (MEM_P (x))
+    {
+      xfoo0 = XEXP (x, 0);
+      if (indirectable_address_p (xfoo0, strict, true))
+       return true;
+    }
+  if (GET_CODE (x) == PRE_DEC || GET_CODE (x) == POST_INC)
+    {
+      xfoo0 = XEXP (x, 0);
+      if (BASE_REGISTER_P (xfoo0, strict))
+       return true;
+    }
   return false;
 }
 

> On Mar 27, 2016, at 16:06, Jake Hamby <jehamby...@me.com> wrote:
> 
> The results you want to see for the test program are the following:
> 
> throwtest(0) returned 0
> throwtest(1) returned 1
> Caught int exception: 123
> Caught double exception: 123.450000
> Caught float exception: 678.900024
> enter recursive_throw(6)
> calling recursive_throw(5)
> enter recursive_throw(5)
> calling recursive_throw(4)
> enter recursive_throw(4)
> calling recursive_throw(3)
> enter recursive_throw(3)
> calling recursive_throw(2)
> enter recursive_throw(2)
> calling recursive_throw(1)
> enter recursive_throw(1)
> calling recursive_throw(0)
> enter recursive_throw(0)
> throwing exception
> Caught int exception: 456
> 
> Before I made the changes I've submitted, this worked on m68k and presumably 
> everywhere else but VAX. On VAX, it crashed due to the pointer size 
> discrepancies that I already talked about. I believe that it should be 
> possible to improve GCC's backend by allowing %ap to be used as an additional 
> general register (removing it from FIXED_REGISTERS, but leaving it in 
> CALL_USED_REGISTERS, since it's modified on CALLS), without breaking the 
> DWARF stack unwinding stuff, since the .cfi information it's emitting notes 
> the correct %fp offset to find the frame, which it actually uses instead of 
> the %ap in stack unwinding.
> 
> Gaining an additional general register to use within a function would be a 
> nice win if it worked. But there are at two problems that must be solved 
> before doing this (that I know of). The first is that certain combinations of 
> VAX instructions and addressing modes seem to have problems when %ap, %fp, 
> and/or %sp are used. I discovered this in the OpenVMS VAX Macro reference 
> (which is essentially an updated version of the 1977 VAX architecture 
> handbook), in Table 8-5, General Register Addressing.
> 
> An additional source of info on which modes fail with address faults when AP 
> or above is used, SimH's vax_cpu.c correctly handles this, and you can trace 
> these macros to find the conditions:
> 
> #define CHECK_FOR_PC    if (rn == nPC) \
>                            RSVD_ADDR_FAULT
> #define CHECK_FOR_SP    if (rn >= nSP) \
>                            RSVD_ADDR_FAULT
> #define CHECK_FOR_AP    if (rn >= nAP) \
>                            RSVD_ADDR_FAULT
> 
> So as long as the correct code is added to vax.md and vax.c to never emit 
> move instructions under the wrong circumstances when %ap is involved, it 
> could be used as a general register. I wonder if the use of %ap to find 
> address arguments is a special case that happens to never emit anything that 
> would fail (with a SIGILL, I believe).
> 
> But the other problem with making %ap available as a general (with a few 
> special cases) register is that it would break one part of the patch I 
> submitted at the beginning of the thread to fix C++ exceptions. One necessary 
> part of that fix was to change "#define FRAME_POINTER_CFA_OFFSET(FNDECL) 0" 
> to "#define ARG_POINTER_CFA_OFFSET(FNDECL) 0", which correctly generates the 
> code to return the value for __builtin_dwarf_cfa () (as an offset of 0 from 
> %ap).
> 
> When I was working on that fix, it seemed like it should be possible, since 
> the DWARF / CFA code that's in there now is using an offset from %fp that it 
> knows, that an improved fix would define FRAME_POINTER_CFA_OFFSET(FNDECL) as 
> something that knows how to return the current CFA (canonical frame address) 
> as an offset from %fp, since that's what it's using for all the .cfi_* 
> directives. But I don't know what a correct definition of 
> FRAME_POINTER_CFA_OFFSET should be in order for it to return that value, 
> instead of 0, because I know that a 0 offset from %fp is definitely wrong, 
> and it's not a fixed offset either (it depends on the number of registers 
> saved in the procedure entry mask). Fortunately, %ap points directly to CFA, 
> so that always works.
> 
> Just some thoughts on future areas for improval... I'm very happy to be able 
> to run the NetBSD testsuite on VAX now. It gives me a lot of confidence as to 
> what works and what doesn't. Most of the stuff I expected to fail (like libm 
> tests, since it's not IEEE FP) failed, and most of the rest succeeded.
> 
> -Jake
> 
> 
>> On Mar 27, 2016, at 15:34, Jake Hamby <jehamby...@me.com> wrote:
>> 
>> I'm very pleased to report that I was able to successfully build a 
>> NetBSD/vax system using the checked-in GCC 5.3, with the patches I've 
>> submitted, setting FIRST_PSEUDO_REGISTER to 17 and DWARF_FRAME_REGISTERS to 
>> 16. The kernel produced with GCC 5.3 crashes (on a real VS4000/90 and also 
>> SimH) in UVM, which may be a bug in the kernel that different optimization 
>> exposed, or a bug in GCC's generated code.
>> 
>> If you don't set DWARF_FRAME_REGISTERS to 16, then C++ exceptions break 
>> again, and GDB suddenly can't deal with the larger debug frames because of 
>> the data structure size mismatch between GCC and GDB. But with that 
>> additional define, you can raise FIRST_PSEUDO_REGISTER to include PSW (which 
>> is correct, since DWARF already uses that meaning), remove the "#ifdef 
>> notworking" around the asserts that Christos added in the NetBSD tree, and 
>> everything works as well as it did with #define FIRST_PSEUDO_REGISTER 16.
>> 
>> Here's the C++ test case I've been using to verify that the stack unwinding 
>> works and that different simple types can be thrown and caught. My ultimate 
>> goal is to be able to run GCC's testsuite because I'm far from certain that 
>> the OS, or even the majority of packages, really exercise all of the 
>> different paths in this very CISC architecture.
>> 
>> #include <string>
>> #include <stdio.h>
>> 
>> int recursive_throw(int i) {
>> printf("enter recursive_throw(%d)\n", i);
>> if (i > 0) {
>>   printf("calling recursive_throw(%d)\n", i - 1);
>>   recursive_throw(i - 1);
>> } else {
>>   printf("throwing exception\n");
>>   throw 456;
>> }
>> printf("exit recursive_throw(%d)\n", i);
>> }
>> 
>> /* Test several kinds of throws. */
>> int throwtest(int test) {
>> switch (test) {
>>   case 0:
>>   case 1:
>>     return test;
>> 
>>   case 2:
>>     throw 123;
>> 
>>   case 3:
>>     throw 123.45;
>> 
>>   case 4:
>>     throw 678.9f;
>> 
>>   case 5:
>>     recursive_throw(6);
>>     return 666;  // fail
>> 
>>   default:
>>     return 999;  // not used in test
>> }
>> }
>> 
>> int main() {
>> for (int i = 0; i < 6; i++) {
>>   try {
>>     int ret = throwtest(i);
>>     printf("throwtest(%d) returned %d\n", i, ret);
>>   } catch (int e) {
>>     printf("Caught int exception: %d\n", e);
>>   } catch (double d) {
>>     printf("Caught double exception: %f\n", d);
>>   } catch (float f) {
>>     printf("Caught float exception: %f\n", (double)f);
>>   }
>> }
>> }
>> 
>> I'm pleased that I got it working, but the change I made to except.c to add:
>> 
>> RTX_FRAME_RELATED_P (insn) = 1;
>> 
>> below:
>> 
>> #ifdef EH_RETURN_HANDLER_RTX
>>     rtx insn = emit_move_insn (EH_RETURN_HANDLER_RTX, crtl->eh.ehr_handler);
>> 
>> isn't really correct, I don't think. It adds an additional .cfi directive 
>> that wasn't there before, and a GCC ./buildsh release fails building 
>> unwind-dw2.c (that's the place where the build either succeeds or fails or 
>> generates bad code) if you try to compile with assertions (and it doesn't 
>> without my change to except.c).
>> 
>> Unfortunately, I don't have a patch for the root cause for me having to add 
>> that line to except.c, which is that the required mov instruction to copy 
>> the __builtin_eh_return() return address into the old stack frame is being 
>> deleted for some reason, otherwise. Since I know the #ifdef 
>> EH_RETURN_HANDLER_RTX line *is* staying in the final code on other archs, I 
>> presume the problem is something VAX-related in the .md file.
>> 
>> If anyone knows about GCC's liveness detection, and specifically any 
>> potential problems that might cause this to be happening (removing a 
>> required "emit_move_insn (EH_RETURN_HANDLER_RTX, ...)" before a return call 
>> that was added in expand_eh_return () but then deleted if -O or higher is 
>> used), any advice would be appreciated as to where to look.
>> 
>> What I'm working on now is cleaning up and refactoring the .md insn 
>> definitions, but I'm not ready to share that code until it works and does 
>> something useful. I'm hoping to be able to optimize the removal of unneeded 
>> tst / cmp instructions when the condition codes have been set to something 
>> useful by a previous insn. I don't think the code in vax_notice_update_cc () 
>> is necessarily correct, and it's very difficult to understand exactly how 
>> it's working, because it's trying to determine this based entirely on 
>> looking at the RTL of the insn (set, call, zero_extract, etc), which I think 
>> may have become out of sync with the types of instructions that are actually 
>> emitted in vax.md for those operations.
>> 
>> So I've just finished tagging the define_insn calls in vax.md with a "cc" 
>> attribute (like the avr backend) which my (hopefully more correct and more 
>> optimized) version of vax_notice_update_cc will use to know exactly what the 
>> flag status is after the insn, for Z, N, and C. Here's my definition of 
>> "cc". I'll share the rest after I'm sure that it works.
>> 
>> ;; Condition code settings.  On VAX, the default value is "clobber".
>> ;; The V flag is often set to zero, or else it has a special meaning,
>> ;; usually related to testing for a signed integer range overflow.
>> ;; "cmp_czn", "cmp_zn", and "cmp_z" are all assumed to modify V, and
>> ;; none is expected to modify C.  "plus" is used after an add/sub,
>> ;; when the flags, including C, return info about the result,
>> ;; including a carry bit to use with, e.g. "adwc".
>> 
>> ;; The important thing to note is that the C flag, in the case of "plus",
>> ;; doesn't reflect the results of a signed integer comparison,
>> ;; as "cmp_czn" does.  Finally, "cmp_zn_use_c" and "cmp_z_use_cn" indicate
>> ;; that all four flags are updated: Z and N, or Z alone, will be a 
>> comparison,
>> ;; but C is set to 0, or some other value, instead of retaining its old 
>> value.
>> ;; Only instructions of type "compare" set the C, Z, and N flags correctly
>> ;; for both signed and unsigned ordered comparisons.
>> 
>> ;; For branching, only Z is needed to test for equality, between two
>> ;; operands or between the first operand and 0.  The N flag is necessary
>> ;; to test for less than zero, and for FP or signed integer comparisons.
>> ;; Both N and Z are required to branch on signed (a <= b) or (a > b).
>> ;; For comparing unsigned integers, the C flag is used instead of N.
>> ;; Both C and Z are required to branch on unsigned (a <= b) or (a > b).
>> 
>> ;; The VAX instruction set is biased towards leaving C alone, relative to
>> ;; the other flags, which tend to be modified on almost every instruction.
>> ;; It's possible to cache the results of two signed int comparison,
>> ;; as long as they're of the form (a < b) or (a >= b), where b may be 0,
>> ;; in the C flag, while other instructions modify the other flags. Then,
>> ;; a test for a branch can be saved later by referring to the previous value.
>> ;; The cc attributes are intended so that this optimization may be performed.
>> 
>> (define_attr "cc" "none,cmp_czn,cmp_zn,cmp_zn_use_c,
>>                      cmp_z,cmp_z_use_czn,plus,clobber"
>> (const_string "clobber"))
>> 
>> 
>> I'll send another email once I have something substantial to contribute. I 
>> give my permission to NetBSD and the FSF to integrate any or all of my 
>> changes under the copyright terms of the original files. Please let me know 
>> if I have to do any additional legal stuff for code contributions. I'm doing 
>> this on my own time and not on behalf of any company or organization.
>> 
>> Best regards,
>> Jake
>> 
>> 
>>> On Mar 26, 2016, at 07:38, Jake Hamby <jehamby...@me.com> wrote:
>>> 
>>> Unfortunately, my previous patch that included a change to 
>>> gcc/config/vax/vax.h that increased FIRST_PSEUDO_REGISTER from 16 to 17 
>>> breaks the C++ exception handling that I’d worked so hard to get right with 
>>> the rest of the patch. I believe I need to define DWARF_FRAME_REGISTERS to 
>>> 16 in the same file to fix the size of the array that libgcc/unwind-dw2.c 
>>> creates. The i386 backend and several others also define it their .h file 
>>> for the same reason (compatibility with hardcoded frame offsets).
>>> 
>>> Here’s the first part of the patch to vax.h that increases 
>>> FIRST_PSEUDO_REGISTER and also adds a definition of DWARF_FRAME_REGISTERS 
>>> as 16, with suitable comment. I’m testing it now. I know that C++ 
>>> exceptions were working before I increased FIRST_PSEUDO_REGISTER to 17.
>>> 
>>> Regards,
>>> Jake
>>> 
>>> Index: external/gpl3/gcc.old/dist/gcc/config/vax/vax.h
>>> ===================================================================
>>> RCS file: /cvsroot/src/external/gpl3/gcc.old/dist/gcc/config/vax/vax.h,v
>>> retrieving revision 1.3
>>> diff -u -r1.3 vax.h
>>> --- external/gpl3/gcc.old/dist/gcc/config/vax/vax.h 23 Sep 2015 03:39:18 
>>> -0000      1.3
>>> +++ external/gpl3/gcc.old/dist/gcc/config/vax/vax.h 26 Mar 2016 14:34:29 
>>> -0000
>>> @@ -119,13 +119,17 @@
>>>  The hardware registers are assigned numbers for the compiler
>>>  from 0 to just below FIRST_PSEUDO_REGISTER.
>>>  All registers that the compiler knows about must be given numbers,
>>> -   even those that are not normally considered general registers.  */
>>> -#define FIRST_PSEUDO_REGISTER 16
>>> +   even those that are not normally considered general registers.
>>> +   This includes PSW, which the VAX backend did not originally include.  */
>>> +#define FIRST_PSEUDO_REGISTER 17
>>> +
>>> +/* For compatibility, DWARF_FRAME_REGISTERS must still be 16.  */
>>> +#define DWARF_FRAME_REGISTERS 16
>>> 
>>> /* 1 for registers that have pervasive standard uses
>>>  and are not available for the register allocator.
>>> -   On the VAX, these are the AP, FP, SP and PC.  */
>>> -#define FIXED_REGISTERS {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1}
>>> +   On the VAX, these are the AP, FP, SP, PC, and PSW.  */
>>> +#define FIXED_REGISTERS {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1}
>>> 
>>> /* 1 for registers not available across function calls.
>>>  These must include the FIXED_REGISTERS and also any
>>> 
>> 
>

Bad CC0 optimizer in VAX backend (was Re: Patches to fix GCC’s C++ exception handling on NetBSD/VAX)

Reply via email to