https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121013

            Bug ID: 121013
           Summary: Possible miscompilation triggered by
                    __builtin_stack_address() at `-O3`.
           Product: gcc
           Version: 15.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: moorabbit at proton dot me
  Target Milestone: ---

While working on implementing __builtin_stack_address() for Clang, I noticed
that GCC can produce different results for this builtin depending on whether
optimizations are enabled.

Context
----------
$ ~/gcc-15.1.0/bin/gcc -v

Target: x86_64-pc-linux-gnu
Configured with: ./configure --prefix=~/gcc-15.1.0 --disable-multilib
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 15.1.0 (GCC)

$ cat /tmp/main.c
extern void f(int, int, long, long, long, long, long, long);

void *a() {
        f(1, 2, 3, 4, 5, 6, 7, 8);
        return __builtin_stack_address();
}

With optimizations disabled (-O0)
-------------------------------------------
$ ~/gcc-15.1.0/bin/gcc -O0 -c /tmp/main.c -o /tmp/main.o && objdump -d
/tmp/main.o

/tmp/main.o:     file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <a>:
   0:   55                                  push   %rbp
   1:   48 89 e5                       mov    %rsp,%rbp
   4:   6a 08                             push   $0x8
   6:   6a 07                             push   $0x7
   8:   41 b9 06 00 00 00       mov    $0x6,%r9d
   e:   41 b8 05 00 00 00       mov    $0x5,%r8d
  14:   b9 04 00 00 00           mov    $0x4,%ecx
  19:   ba 03 00 00 00           mov    $0x3,%edx
  1e:   be 02 00 00 00           mov    $0x2,%esi
  23:   bf 01 00 00 00            mov    $0x1,%edi
  28:   e8 00 00 00 00           call   2d <a+0x2d>
  2d:   48 83 c4 10                 add    $0x10,%rsp
  31:   48 89 e0                      mov    %rsp,%rax
  34:   c9                                 leave
  35:   c3                                 ret

With optimizations enabled (-O3)
-------------------------------------------
$ ~/gcc-15.1.0/bin/gcc -O3 -c /tmp/main.c -o /tmp/main.o && objdump -d
/tmp/main.o

/tmp/main.o:     file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <a>:
   0:   48 83 ec 08                  sub    $0x8,%rsp
   4:   41 b9 06 00 00 00       mov    $0x6,%r9d
   a:   41 b8 05 00 00 00       mov    $0x5,%r8d
  10:   b9 04 00 00 00          mov    $0x4,%ecx
  15:   6a 08                          push   $0x8
  17:   ba 03 00 00 00          mov    $0x3,%edx
  1c:   be 02 00 00 00          mov    $0x2,%esi
  21:   bf 01 00 00 00           mov    $0x1,%edi
  26:   6a 07                          push   $0x7
  28:   e8 00 00 00 00          call   2d <a+0x2d>
  2d:   48 89 e0                     mov    %rsp,%rax
  30:   48 83 c4 18                add    $0x18,%rsp
  34:   c3                                ret

Issue
-------
The issue is in the order of the `mov %rsp, %rax` and `add _, %rsp`
instructions.

At -O0, we first adjust the stack pointer by adding $0x10 to %rsp in <a+0x2d>.
We then save %rsp to %rax in <a+0x31> and return from the procedure.

At -O3, we first save %rsp to %rax in <a+0x2d>. We then adjust the stack
pointer by adding $0x18 to %rsp in <a+0x30> and return from the procedure.

What's the right behavior?
---------------------------------
The comment within the `static rtx expand_builtin_stack_address()` procedure
located in gcc/builtins.cc states:

```[...] the outgoing on-stack arguments pushed temporarily for a call are
regarded as part of the callee's stack range, rather than the caller's.```

This makes the codegen at -O3 incorrect because, at the moment when %rsp is
saved (<a+0x2d>), it still includes the space used by the temporary pushed
on-stack arguments. That's not the case at -O0. -O1 and -O2 have the same
problem as -O3.

Reply via email to