https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120870

--- Comment #37 from Steven Sun <StevenSun2021 at hotmail dot com> ---
Based on the root cause above, I applied a patch to confirm it. The patch is
not claimed to be the optimal upstream solution, but it does eliminate the
crash and passes all CPython tests.

## Patch for Root-Cause Confirmation

Base GCC commit: fe440c99d6a96498d663c86c0298e1c97e24c410
CPython 3.14 commit: f4a64307a646ccb02f6fcab805fb733a910429fa

```diff
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index e73c2d7f7d0..54e5dd8d465 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -514,6 +514,25 @@ ix86_conditional_register_usage (void)
       if (!fixed_regs[i] && !ix86_function_value_regno_p (i))
        call_used_regs[i] = 0;

+  /* For preserve_none ABI, R12-R15 are parameter registers and are
+     not callee-saved.  Mark them as call-used so the register
+     allocator won't use them for local variables, especially in
+     tail-call functions where they must hold argument values.
+     Also mark RBX as fixed when preserve_none is used.  DRAP may
+     select RBX for stack realignment, and the DRAP value must survive
+     the function body for the epilogue.  By fixing RBX we prevent the
+     register allocator from assigning it to any pseudo.  */
+  if (cfun
+      && cfun->machine->call_saved_registers == TYPE_PRESERVE_NONE
+      && TARGET_64BIT)
+    {
+      call_used_regs[R12_REG] = 1;
+      call_used_regs[R13_REG] = 1;
+      call_used_regs[R14_REG] = 1;
+      call_used_regs[R15_REG] = 1;
+      fixed_regs[BX_REG] = 1;
+    }
+
   /* For 32-bit targets, disable the REX registers.  */
   if (! TARGET_64BIT)
     {
@@ -7942,6 +7961,14 @@ find_drap_reg (void)
 {
   tree decl = cfun->decl;

+  /* For preserve_none ABI, R12-R15 are argument registers, not free
+     callee-saved registers.  Using them as DRAP would clobber live
+     arguments flowing through tail-call jmp chains.  RBX is not an
+     argument register in preserve_none, so it is safe to use as DRAP.  */
+  if (TARGET_64BIT
+      && cfun->machine->call_saved_registers == TYPE_PRESERVE_NONE)
+    return BX_REG;
+
   /* Always use callee-saved register if there are no caller-saved
      registers.  */
   if (TARGET_64BIT)
@@ -9403,8 +9430,12 @@ ix86_expand_prologue (void)
               "in interrupt service routine.  This may be worked "
               "around by avoiding functions with aggregate return.");

-      /* Only need to push parameter pointer reg if it is caller saved.  */
-      if (!call_used_or_fixed_reg_p (REGNO (crtl->drap_reg)))
+      /* For preserve_none, always push the DRAP register so we can
+        restore the caller's value in the epilogue.  The register is
+        fixed and won't be modified by the function body.  */
+      if (!call_used_or_fixed_reg_p (REGNO (crtl->drap_reg))
+         || (cfun->machine->call_saved_registers == TYPE_PRESERVE_NONE
+             && TARGET_64BIT))
        {
          /* Push arg pointer reg */
          insn = emit_insn (gen_push (crtl->drap_reg));
@@ -10630,7 +10661,9 @@ ix86_expand_epilogue (int style)

       if (ix86_static_chain_on_stack)
        param_ptr_offset += UNITS_PER_WORD;
-      if (!call_used_or_fixed_reg_p (REGNO (crtl->drap_reg)))
+      if (!call_used_or_fixed_reg_p (REGNO (crtl->drap_reg))
+         || (cfun->machine->call_saved_registers == TYPE_PRESERVE_NONE
+             && TARGET_64BIT))
        param_ptr_offset += UNITS_PER_WORD;

       insn = emit_insn (gen_rtx_SET
@@ -10647,7 +10680,9 @@ ix86_expand_epilogue (int style)
                           param_ptr_offset));
       RTX_FRAME_RELATED_P (insn) = 1;

-      if (!call_used_or_fixed_reg_p (REGNO (crtl->drap_reg)))
+      if (!call_used_or_fixed_reg_p (REGNO (crtl->drap_reg))
+         || (cfun->machine->call_saved_registers == TYPE_PRESERVE_NONE
+             && TARGET_64BIT))
        ix86_emit_restore_reg_using_pop (crtl->drap_reg);
     }
```

**What it does:**
1. `ix86_conditional_register_usage()`: Marks RBX as `fixed` for
`preserve_none`, preventing allocation.
2. `find_drap_reg()`: Selects `BX_REG` instead of `R13_REG` for
`preserve_none`, since R12–R15 are parameter registers.
3. `ix86_expand_prologue()`: Always pushes the DRAP register for
`preserve_none`.
4. `ix86_expand_epilogue()`: Accounts for the pushed DRAP in offset
calculations and always pops it.

### Verification

After applying the patch and rebuilding both GCC and CPython:
- GCC compiles successfully
- CPython builds successfully with `--with-tail-call-interp` and
`-march=znver2`
- `_bootstrap_python` no longer segfaults during freeze module generation
- Interpreter tests (`test_dis`, `test_call`, `test_property`, `test_compile`,
etc.) pass

This confirms the root cause: protecting the DRAP register from allocation
ensures it survives the function body unchanged, so the epilogue computes the
correct stack pointer.

### Other Possible Solutions

**Save DRAP value on the stack**
Save the DRAP value in a stack slot during the prologue and reload it in the
epilogue, instead of relying on the register surviving.
- Pros: Does not reduce available registers.
- Cons: Requires modifying stack layout after register allocation; fragile for
CFA tracking and unwinding.

**Move DRAP selection before register allocation**
Call `find_drap_reg()` during `cfgexpand.cc` (before LRA), excluding the chosen
register from allocation.
- Pros: Architecturally clean.
- Cons: Requires changes to `cfgexpand.cc`, `ira.cc`, and the x86 backend; high
regression risk.

### Summary

Root cause: DRAP register (RBX) selected for stack realignment in
`preserve_none` functions is clobbered by the register allocator before the
epilogue uses it to restore RSP.

Trigger: `-march=znver2` raises stack alignment to 128 bits, forcing DRAP;
preserve_none` makes all GPRs allocatable.

Impact: Corrupted RSP propagates through tail-call `jmp` chains, causing memory
corruption and segfault.

Reply via email to