Issue 185361
Summary [MC][AArch64] Clang crashes when assembling AArch64 `ext` with symbol as immediate operand
Labels clang
Assignees
Reporter venkyqz
    ## Summary

`llvm-mc` (debug build) crashes with an assertion failure when assembling the AArch64 Advanced SIMD `ext` (vector extract) instruction if the byte-index immediate operand is specified as a symbol reference (e.g., `f0`, `NaN`, `Inf`). The release build silently emits wrong object code with the byte-index field set to 0.

---

## Reproduction

**Godbolt Link**
+ https://godbolt.org/z/MhYY85j83

**Test File (`poc.s`):**

```asm
.text
ext v0.8b, v1.8b, v2.8b, f0
```

**Commands:**

```bash
# Debug Build - Crashes with assertion
echo -e ".text\next v0.8b, v1.8b, v2.8b, f0" | llvm-mc - \
  --arch=aarch64 --triple=aarch64-linux-gnu --filetype=obj -o /dev/null
# Exit 134, Assertion failed

# Release Build - Silent miscompilation
echo -e ".text\next v0.8b, v1.8b, v2.8b, f0" | llvm-mc - \
  --arch=aarch64 --triple=aarch64-linux-gnu --filetype=obj -o /dev/null
# Exit 0, byte-index field set to 0
```

**Debug Build Output:**

```
llvm-mc: AArch64MCCodeEmitter.cpp:239:
Assertion `MO.isImm() && "did not expect relocated _expression_"' failed.

Stack dump:
  #9 (anonymous namespace)::AArch64MCCodeEmitter::getMachineOpValue(...)
         AArch64MCCodeEmitter.cpp:239
  #10 (anonymous namespace)::AArch64MCCodeEmitter::getBinaryCodeForInstr(...)
         AArch64GenMCCodeEmitter.inc:11533
```

---

## Root Cause

The `ext` instruction is defined in `AArch64InstrFormats.td` via `BaseSIMDBitwiseExtract`:

```tablegen
class BaseSIMDBitwiseExtract<...> : I<
  (outs regtype:$Rd), (ins regtype:$Rn, regtype:$Rm, i32imm:$imm), ...
> {
  bits<4> imm;
  let Inst{14-11} = imm;
}
```

The `i32imm` operand type has **no `ParserMatchClass`** restriction. Any identifier — including register names (`f0`-`f31`), C-style constants (`NaN`, `Inf`), or labels — is accepted at parse time as an MCExpr symbol reference.

At encoding time, `getMachineOpValue` asserts without checking `isExpr()`:

```cpp
// AArch64MCCodeEmitter.cpp:231-240
unsigned getMachineOpValue(const MCInst &MI, const MCOperand &MO, ...) const {
  if (MO.isReg())
    return Ctx.getRegisterInfo()->getEncodingValue(MO.getReg());

  assert(MO.isImm() && "did not expect relocated _expression_");
  return static_cast<unsigned>(MO.getImm());
}
```

---

## Impact

| Build Type | Behavior | Security Risk |
|------------|----------|---------------|
| Debug | Assertion failure (Exit 134) | Detectable |
| Release | Silent miscompilation (Exit 0) | **Undetectable in CI/CD** |

**Affected Instructions:**

| Instruction | Debug | Release |
|-------------|-------|---------|
| `ext v0.8b, v1.8b, v2.8b, f0` | 134 | 0 |
| `ext v0.16b, v1.16b, v2.16b, f0` | 134 | 0 |
| `ext v0.8b, v1.8b, v2.8b, NaN` | 134 | 0 |
| `ext v0.8b, v1.8b, v2.8b, Inf` | 134 | 0 |

**Release Behavior:**
```
# Correct (index=2):
ext v0.8b, v1.8b, v2.8b, #2 // encoding: [0x20,0x10,0x02,0x2e]

# Wrong (index=0, symbol 'f0' treated as 0):
ext v0.8b, v1.8b, v2.8b, f0 // encoding: [0x20,0x00,0x02,0x2e]
```

**Affected Triples:**
- `aarch64-linux-gnu`
- `aarch64_be-linux-gnu`
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to