https://sourceware.org/bugzilla/show_bug.cgi?id=33943

            Bug ID: 33943
           Summary: gas: .prefalign directive for body-size-dependent
                    function alignment
           Product: binutils
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: gas
          Assignee: unassigned at sourceware dot org
          Reporter: maskray at sourceware dot org
  Target Milestone: ---

Compilers emit `.p2align 4` (or similar) before functions to align them to a
preferred boundary (e.g. 16 bytes on x86-64). This is good for large functions
but wasteful for small ones: a 3-byte function padded to a 16-byte boundary
wastes up to 15 bytes — 500% overhead.

LLVM recently introduced a new assembler directive `.prefalign`, that computes
alignment value based on the size of the section.

```
#### current syntax with surprising behavior; to be revised
.section .text.f,"ax",@progbits
.p2align 2
# 3-byte function body: aligned to std::bit_ceil(3) = 4
.prefalign 16
f:
  ret
  ret
  ret

.section .text.g,"ax",@progbits
.p2align 2
# 16-byte function body: no-op
.prefalign 16
f:
  .space 16
```

The implementation does not actually align the current location, but simply
increases the section alignment (ELF sh_addralign).
I find this behavior surprising and propose the following revision:

.prefalign <pref_align>, <end_sym>, nop
.prefalign <pref_align>, <end_sym>, <fill_byte>

- `pref_align`: the preferred (maximum) alignment, must be a power of 2
- `end_sym`: a symbol marking the end of the code body
- Third operand: `nop` for target-appropriate (variable-size) NOP fill, or an
integer byte value `[0, 255]`

The assembler computes `body_size = end_sym - (directive_location + padding)`
during relaxation and determines the alignment:

- `body_size < pref_align`: align to `std::bit_ceil(body_size)` (the smallest
integral power of two that is not smaller than `body_size`). The alignment is 1
for a body_size of 0 or 1.
- `body_size >= pref_align`: align to `pref_align`

The rationale for the small-body rule: if the cache block size is 64 and the
goal is to minimize cache block crossings, aligning to `min(64,
bit_ceil(body_size))` is the minimum alignment that prevents an unnecessary
boundary crossing.
For example, a 12-byte function aligned to bit_ceil(12) = 16 cannot straddle a
64-byte boundary.
A 3-byte function aligned to bit_ceil(3) = 4 cannot straddle a 64-byte
boundary.

To enforce a minimum alignment independently, users emit both `.p2align` and
`.prefalign`.

**Prior art**

GCC's `-flimit-function-alignment` partially addresses this by capping the
`.p2align` max-skip operand based on function size. However, the max-skip
operand is evaluated at parse time, so it cannot reference a forward label:

```asm
# Not supported — forward reference in max-skip
.p2align 4, , end - start
start:
  nop
end:
```

Even with max-skip expressions, the directive fails to achieve proportional
alignment. This forces a premature size calculation by the compiler that
ignores assembler-side adjustments (e.g., span-dependent instruction
relaxation).


**Example**

```asm
.section .text.f,"ax",@progbits
.prefalign 16, .Lf1_end, nop
# 3-byte function body: aligned to std::bit_ceil(3) = 4
nop
nop
nop
.Lf1_end:
.prefalign 16, .Lf2_end, nop
# 32-byte function body: aligned to 16
...
.Lf2_end:
```

**Implementation notes**

- The directive creates a new fragment type whose size is determined
iteratively during the relaxation loop, with the body-size-dependent rule.
- The fill operand is required to make the intent explicit (NOP fill for code,
zero/byte fill for data).
- For targets with linker relaxation (e.g. RISC-V), `.prefalign` padding is
fully resolved at assembly time and does not require `R_RISCV_ALIGN`-style
relocations.

**References**

- LLVM RFC:
https://discourse.llvm.org/t/rfc-enhancing-function-alignment-attributes/88019
- Blog post with detailed analysis:
https://maskray.me/blog/2025-08-24-understanding-alignment-from-source-to-object-file

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Reply via email to