https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120870
--- Comment #37 from Steven Sun <StevenSun2021 at hotmail dot com> ---
Based on the root cause above, I applied a patch to confirm it. The patch is
not claimed to be the optimal upstream solution, but it does eliminate the
crash and passes all CPython tests.
## Patch for Root-Cause Confirmation
Base GCC commit: fe440c99d6a96498d663c86c0298e1c97e24c410
CPython 3.14 commit: f4a64307a646ccb02f6fcab805fb733a910429fa
```diff
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index e73c2d7f7d0..54e5dd8d465 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -514,6 +514,25 @@ ix86_conditional_register_usage (void)
if (!fixed_regs[i] && !ix86_function_value_regno_p (i))
call_used_regs[i] = 0;
+ /* For preserve_none ABI, R12-R15 are parameter registers and are
+ not callee-saved. Mark them as call-used so the register
+ allocator won't use them for local variables, especially in
+ tail-call functions where they must hold argument values.
+ Also mark RBX as fixed when preserve_none is used. DRAP may
+ select RBX for stack realignment, and the DRAP value must survive
+ the function body for the epilogue. By fixing RBX we prevent the
+ register allocator from assigning it to any pseudo. */
+ if (cfun
+ && cfun->machine->call_saved_registers == TYPE_PRESERVE_NONE
+ && TARGET_64BIT)
+ {
+ call_used_regs[R12_REG] = 1;
+ call_used_regs[R13_REG] = 1;
+ call_used_regs[R14_REG] = 1;
+ call_used_regs[R15_REG] = 1;
+ fixed_regs[BX_REG] = 1;
+ }
+
/* For 32-bit targets, disable the REX registers. */
if (! TARGET_64BIT)
{
@@ -7942,6 +7961,14 @@ find_drap_reg (void)
{
tree decl = cfun->decl;
+ /* For preserve_none ABI, R12-R15 are argument registers, not free
+ callee-saved registers. Using them as DRAP would clobber live
+ arguments flowing through tail-call jmp chains. RBX is not an
+ argument register in preserve_none, so it is safe to use as DRAP. */
+ if (TARGET_64BIT
+ && cfun->machine->call_saved_registers == TYPE_PRESERVE_NONE)
+ return BX_REG;
+
/* Always use callee-saved register if there are no caller-saved
registers. */
if (TARGET_64BIT)
@@ -9403,8 +9430,12 @@ ix86_expand_prologue (void)
"in interrupt service routine. This may be worked "
"around by avoiding functions with aggregate return.");
- /* Only need to push parameter pointer reg if it is caller saved. */
- if (!call_used_or_fixed_reg_p (REGNO (crtl->drap_reg)))
+ /* For preserve_none, always push the DRAP register so we can
+ restore the caller's value in the epilogue. The register is
+ fixed and won't be modified by the function body. */
+ if (!call_used_or_fixed_reg_p (REGNO (crtl->drap_reg))
+ || (cfun->machine->call_saved_registers == TYPE_PRESERVE_NONE
+ && TARGET_64BIT))
{
/* Push arg pointer reg */
insn = emit_insn (gen_push (crtl->drap_reg));
@@ -10630,7 +10661,9 @@ ix86_expand_epilogue (int style)
if (ix86_static_chain_on_stack)
param_ptr_offset += UNITS_PER_WORD;
- if (!call_used_or_fixed_reg_p (REGNO (crtl->drap_reg)))
+ if (!call_used_or_fixed_reg_p (REGNO (crtl->drap_reg))
+ || (cfun->machine->call_saved_registers == TYPE_PRESERVE_NONE
+ && TARGET_64BIT))
param_ptr_offset += UNITS_PER_WORD;
insn = emit_insn (gen_rtx_SET
@@ -10647,7 +10680,9 @@ ix86_expand_epilogue (int style)
param_ptr_offset));
RTX_FRAME_RELATED_P (insn) = 1;
- if (!call_used_or_fixed_reg_p (REGNO (crtl->drap_reg)))
+ if (!call_used_or_fixed_reg_p (REGNO (crtl->drap_reg))
+ || (cfun->machine->call_saved_registers == TYPE_PRESERVE_NONE
+ && TARGET_64BIT))
ix86_emit_restore_reg_using_pop (crtl->drap_reg);
}
```
**What it does:**
1. `ix86_conditional_register_usage()`: Marks RBX as `fixed` for
`preserve_none`, preventing allocation.
2. `find_drap_reg()`: Selects `BX_REG` instead of `R13_REG` for
`preserve_none`, since R12–R15 are parameter registers.
3. `ix86_expand_prologue()`: Always pushes the DRAP register for
`preserve_none`.
4. `ix86_expand_epilogue()`: Accounts for the pushed DRAP in offset
calculations and always pops it.
### Verification
After applying the patch and rebuilding both GCC and CPython:
- GCC compiles successfully
- CPython builds successfully with `--with-tail-call-interp` and
`-march=znver2`
- `_bootstrap_python` no longer segfaults during freeze module generation
- Interpreter tests (`test_dis`, `test_call`, `test_property`, `test_compile`,
etc.) pass
This confirms the root cause: protecting the DRAP register from allocation
ensures it survives the function body unchanged, so the epilogue computes the
correct stack pointer.
### Other Possible Solutions
**Save DRAP value on the stack**
Save the DRAP value in a stack slot during the prologue and reload it in the
epilogue, instead of relying on the register surviving.
- Pros: Does not reduce available registers.
- Cons: Requires modifying stack layout after register allocation; fragile for
CFA tracking and unwinding.
**Move DRAP selection before register allocation**
Call `find_drap_reg()` during `cfgexpand.cc` (before LRA), excluding the chosen
register from allocation.
- Pros: Architecturally clean.
- Cons: Requires changes to `cfgexpand.cc`, `ira.cc`, and the x86 backend; high
regression risk.
### Summary
Root cause: DRAP register (RBX) selected for stack realignment in
`preserve_none` functions is clobbered by the register allocator before the
epilogue uses it to restore RSP.
Trigger: `-march=znver2` raises stack alignment to 128 bits, forcing DRAP;
preserve_none` makes all GPRs allocatable.
Impact: Corrupted RSP propagates through tail-call `jmp` chains, causing memory
corruption and segfault.