Issue 60972
Summary [LLC] Tail-call optimization on arm64 macOS generates incorrect spill & reload
Labels new issue
Assignees
Reporter Baltoli
    # Description

I maintain a language runtime that relies on guaranteed tail call optimizations provided by `llc`. On arm64 Macs, a seemingly correct program that executed successfully on other platforms (X86 Linux & macOS) exhibited segmentation faults due to steps executed earlier in the program[^1].

The issue can be narrowed down - I believe - to incorrect stack-spilling code when calling a function with more than 8 arguments. I have minimized the issue to two LLVM programs that differ only in the number of arguments passed to a tail-called `fastcc` function (8 / 9 respectively). When passing 8 arguments, the program executes correctly, but segfaults for 9.

With reference to the (heavily minimized) LLVM code and reproduction steps below, I observe the following snippets of assembly code being generated for the failing case:
```asm
_foo:                                   ; @foo
	add	sp, sp, #16
	ret

_baz:
        ...                                     ; abbreviated prelude
        bl	_foo
	str	x0, [sp, #8]                    ; 8-byte Folded Spill
	sub	sp, sp, #16
	bl	_bar
	ldr	x0, [sp, #8]                    ; 8-byte Folded Reload
        ...                                     ; abbreviated postlude
```

Here, the return value of `_foo` is being spilled to the stack at address `sp + 8`, which the compiler then expects to reload into `x0` after the call to `_bar`. **However**, the stack adjustment to undo the change made by `_foo` is between the spill and reload, meaning that the address `sp + 8` is different for each.

In the successful program (8 args) below, `_foo` and `_baz` do not perform any stack adjustment, but the spill and reload offset assumptions are the same:
```asm
_foo:                                   ; @foo
	ret

_baz:
        ...                                     ; abbreviated prelude
	bl	_foo
	str	x0, [sp, #8]                    ; 8-byte Folded Spill
	bl	_bar
	ldr	x0, [sp, #8]                    ; 8-byte Folded Reload
        ...                                     ; abbreviated postlude
```

Please let me know if there is any more information it would be helpful for me to provide.

# Reproduction

The problem can be reproduced with the LLVM files and shell commands below. My machine is an M1 Mac Mini (full system information below), but I believe the issue can be reproduced on any arm64 Mac.

## Instructions

```console
$ llc -O0 -tailcallopt bad.ll -o bad.s
$ clang bad.s -o bad
$ ./bad
[1]    68198 segmentation fault  ./bad
$ echo $?
139
```

```console
$ llc -O0 -tailcallopt good.ll -o good.s
$ clang good.s -o good
$ ./good 
$ echo $?
0
```

## Code

### Failing: `bad.ll`

```llvm
target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
target triple = "arm64-apple-macosx12.0.0"

define void @bar() {
  ret void
}

define fastcc i64 @foo(i64 %0, i64 %1, i64 %2, i64 %3, i64 %4, i64 %5, i64 %6, i64 %7, i64 %8) {
  ret i64 %0
}

define fastcc i64 @baz() {
entry:
  %0 = tail call fastcc i64 @foo(i64 0, i64 0, i64 0, i64 0, i64 0, i64 0, i64 0, i64 0, i64 0)
  call void @bar()
  ret i64 %0
}

define i32 @main() {
entry:
  %0 = call i64 @baz()
  %1 = trunc i64 %0 to i32
  ret i32 %1
}
```

### Successful: `good.ll`

```llvm
target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
target triple = "arm64-apple-macosx12.0.0"

define void @bar() {
  ret void
}

define fastcc i64 @foo(i64 %0, i64 %1, i64 %2, i64 %3, i64 %4, i64 %5, i64 %6, i64 %7) {
  ret i64 %0
}

define fastcc i64 @baz() {
entry:
  %0 = tail call fastcc i64 @foo(i64 0, i64 0, i64 0, i64 0, i64 0, i64 0, i64 0, i64 0)
  call void @bar()
  ret i64 %0
}

define i32 @main() {
entry:
  %0 = call i64 @baz()
  %1 = trunc i64 %0 to i32
  ret i32 %1
}
```

## Environment

<details>
  <summary>System Information</summary>

```console
$ system_profiler SPSoftwareDataType SPHardwareDataType
Software:

    System Software Overview:

      System Version: macOS 12.3 (21E230)
      Kernel Version: Darwin 21.4.0
      Boot Volume: Macintosh HD
      Boot Mode: Normal
      Computer Name: 44634
      User Name: administrator (administrator)
      Secure Virtual Memory: Enabled
      System Integrity Protection: Enabled
      Time since boot: 38 days 17:45

Hardware:

    Hardware Overview:

      Model Name: Mac mini
      Model Identifier: Macmini9,1
      Chip: Apple M1
      Total Number of Cores: 8 (4 performance and 4 efficiency)
      Memory: 8 GB
      System Firmware Version: 7459.121.3
      OS Loader Version: 7459.101.2
      Serial Number (system): H2WDQ1K6Q6NV
      Hardware UUID: 92B3F9F2-F010-5367-8CAF-7716809951EE
      Provisioning UDID: 00008103-001908282440291E
      Activation Lock Status: Disabled
```
</details>

<details>
  <summary>LLVM Versions</summary>

```console
$ clang --version
Homebrew clang version 15.0.7
Target: arm64-apple-darwin21.4.0
Thread model: posix
InstalledDir: /opt/homebrew/opt/llvm/bin 
$ llc --version
Homebrew LLVM version 15.0.7
  Optimized build.
  Default target: arm64-apple-darwin21.4.0
  Host CPU: apple-m1

  Registered Targets:
    aarch64    - AArch64 (little endian)
    aarch64_32 - AArch64 (little endian ILP32)
    aarch64_be - AArch64 (big endian)
    amdgcn     - AMD GCN GPUs
    arm        - ARM
    arm64      - ARM64 (little endian)
    arm64_32   - ARM64 (little endian ILP32)
    armeb      - ARM (big endian)
    avr        - Atmel AVR Microcontroller
    bpf        - BPF (host endian)
    bpfeb      - BPF (big endian)
    bpfel      - BPF (little endian)
    hexagon    - Hexagon
    lanai      - Lanai
    mips       - MIPS (32-bit big endian)
    mips64     - MIPS (64-bit big endian)
    mips64el   - MIPS (64-bit little endian)
    mipsel     - MIPS (32-bit little endian)
    msp430     - MSP430 [experimental]
    nvptx      - NVIDIA PTX 32-bit
    nvptx64    - NVIDIA PTX 64-bit
    ppc32      - PowerPC 32
    ppc32le    - PowerPC 32 LE
    ppc64      - PowerPC 64
    ppc64le    - PowerPC 64 LE
    r600       - AMD GPUs HD2XXX-HD6XXX
    riscv32    - 32-bit RISC-V
    riscv64    - 64-bit RISC-V
    sparc      - Sparc
    sparcel    - Sparc LE
    sparcv9    - Sparc V9
    systemz    - SystemZ
    thumb      - Thumb
    thumbeb    - Thumb (big endian)
    ve         - VE
    wasm32     - WebAssembly 32-bit
    wasm64     - WebAssembly 64-bit
    x86        - 32-bit X86: Pentium-Pro and above
    x86-64     - 64-bit X86: EM64T and AMD64
    xcore      - XCore
```
</details>

<details>
  <summary>Homebrew</summary>

```console
$ brew info llvm
==> Downloading https://formulae.brew.sh/api/cask.json
######################################################################## 100.0%
==> llvm: stable 15.0.7 (bottled), HEAD [keg-only]
Next-gen compiler infrastructure
https://llvm.org/
/opt/homebrew/Cellar/llvm/15.0.7_1 (6,411 files, 1.3GB)
  Poured from bottle using the formulae.brew.sh API on 2023-02-17 at 11:24:42
From: https://github.com/Homebrew/homebrew-core/blob/HEAD/Formula/llvm.rb
License: Apache-2.0 with LLVM-exception
==> Dependencies
Build: cmake ✔, swig ✘
Required: [email protected] ✘, six ✔, z3 ✔, zstd ✔
==> Options
--HEAD
	Install HEAD version
==> Caveats
To use the bundled libc++ please add the following LDFLAGS:
  LDFLAGS="-L/opt/homebrew/opt/llvm/lib/c++ -Wl,-rpath,/opt/homebrew/opt/llvm/lib/c++"

llvm is keg-only, which means it was not symlinked into /opt/homebrew,
because macOS already provides this software and installing another version in
parallel can cause all kinds of trouble.

If you need to have llvm first in your PATH, run:
  echo 'export PATH="/opt/homebrew/opt/llvm/bin:$PATH"' >> ~/.zshrc

For compilers to find llvm you may need to set:
  export LDFLAGS="-L/opt/homebrew/opt/llvm/lib"
  export CPPFLAGS="-I/opt/homebrew/opt/llvm/include"

==> Analytics
install: 34,000 (30 days), 133,673 (90 days), 518,755 (365 days)
install-on-request: 29,851 (30 days), 115,410 (90 days), 396,557 (365 days)
build-error: 163 (30 days)
```
</details>

[^1]: Concretely, a function in our language can be observed to return a correct / expected value, but when the result is stored in a data structure and later retrieved, it no longer points to a valid value, and the program segfaults in library code whose invariants have been violated.
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to