https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120870

Steven Sun <StevenSun2021 at hotmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |StevenSun2021 at hotmail dot 
com

--- Comment #36 from Steven Sun <StevenSun2021 at hotmail dot com> ---
Thanks for reporting to CPython side. I am a maintainer of CPython and I did
some analysis on this bug. Here is what I found.

## Root Cause: DRAP Register Clobbered in `preserve_none` Functions

### The DRAP Mechanism

When a function needs stricter stack alignment than the ABI guarantees (e.g.,
128-bit/16-byte for AVX/AVX2 on znver2), GCC uses a **DRAP (Dynamic Realign
Argument Pointer)** register:

1. Prologue: choose a register, push it, compute its value = original RSP, then
align the stack.
2. Epilogue: use that register to restore RSP before returning: `lea
-offset(%drap_reg), %rsp`.

**GCC's implicit invariant:** The DRAP register value must survive unchanged
from prologue to epilogue.

### How `preserve_none` Breaks This Invariant

The `preserve_none` ABI marks all general-purpose registers (except RBP and
RSP) as caller-saved or parameter registers. GCC's register allocator is free
to use any of them for local temporaries without saving/restoring.

In this bug:

1. `find_drap_reg()` selects **RBX** as DRAP for a `preserve_none` function.
2. The function body (a large tail-call interpreter opcode handler) freely
modifies RBX.
3. The epilogue executes `lea -0x10(%rbx), %rsp` — but **RBX now holds a
different value** than what the prologue computed.
4. RSP becomes garbage. The corrupted stack pointer propagates through `jmp`
tail-call chains, eventually causing a segfault.

### Why `-march=znver2` Specifically Triggers It

`-march=znver2` sets `preferred_stack_boundary` to 128 bits (16 bytes), forcing
stack realignment for functions with outgoing stack arguments. Without this
flag, the default 64-bit stack boundary avoids DRAP entirely for most
functions.

That's why `-march=x86-64-v3` and `-march=x86-64-v4` also trigger it.

### Assembly Evidence

In the miscompiled function with `-march=znver2`:

```asm
push   %rbx
sub    $0x28, %rsp
...
; function body modifies RBX freely
...
lea    -0x10(%rbx), %rsp   ; uses CORRUPTED RBX
pop    %rbx                ; too late — RSP is already wrong
```

The `lea` uses RBX after it has been overwritten by the function body,
computing a garbage stack pointer.

Reply via email to