https://sourceware.org/bugzilla/show_bug.cgi?id=33943
Bug ID: 33943
Summary: gas: .prefalign directive for body-size-dependent
function alignment
Product: binutils
Version: unspecified
Status: NEW
Severity: normal
Priority: P2
Component: gas
Assignee: unassigned at sourceware dot org
Reporter: maskray at sourceware dot org
Target Milestone: ---
Compilers emit `.p2align 4` (or similar) before functions to align them to a
preferred boundary (e.g. 16 bytes on x86-64). This is good for large functions
but wasteful for small ones: a 3-byte function padded to a 16-byte boundary
wastes up to 15 bytes — 500% overhead.
LLVM recently introduced a new assembler directive `.prefalign`, that computes
alignment value based on the size of the section.
```
#### current syntax with surprising behavior; to be revised
.section .text.f,"ax",@progbits
.p2align 2
# 3-byte function body: aligned to std::bit_ceil(3) = 4
.prefalign 16
f:
ret
ret
ret
.section .text.g,"ax",@progbits
.p2align 2
# 16-byte function body: no-op
.prefalign 16
f:
.space 16
```
The implementation does not actually align the current location, but simply
increases the section alignment (ELF sh_addralign).
I find this behavior surprising and propose the following revision:
.prefalign <pref_align>, <end_sym>, nop
.prefalign <pref_align>, <end_sym>, <fill_byte>
- `pref_align`: the preferred (maximum) alignment, must be a power of 2
- `end_sym`: a symbol marking the end of the code body
- Third operand: `nop` for target-appropriate (variable-size) NOP fill, or an
integer byte value `[0, 255]`
The assembler computes `body_size = end_sym - (directive_location + padding)`
during relaxation and determines the alignment:
- `body_size < pref_align`: align to `std::bit_ceil(body_size)` (the smallest
integral power of two that is not smaller than `body_size`). The alignment is 1
for a body_size of 0 or 1.
- `body_size >= pref_align`: align to `pref_align`
The rationale for the small-body rule: if the cache block size is 64 and the
goal is to minimize cache block crossings, aligning to `min(64,
bit_ceil(body_size))` is the minimum alignment that prevents an unnecessary
boundary crossing.
For example, a 12-byte function aligned to bit_ceil(12) = 16 cannot straddle a
64-byte boundary.
A 3-byte function aligned to bit_ceil(3) = 4 cannot straddle a 64-byte
boundary.
To enforce a minimum alignment independently, users emit both `.p2align` and
`.prefalign`.
**Prior art**
GCC's `-flimit-function-alignment` partially addresses this by capping the
`.p2align` max-skip operand based on function size. However, the max-skip
operand is evaluated at parse time, so it cannot reference a forward label:
```asm
# Not supported — forward reference in max-skip
.p2align 4, , end - start
start:
nop
end:
```
Even with max-skip expressions, the directive fails to achieve proportional
alignment. This forces a premature size calculation by the compiler that
ignores assembler-side adjustments (e.g., span-dependent instruction
relaxation).
**Example**
```asm
.section .text.f,"ax",@progbits
.prefalign 16, .Lf1_end, nop
# 3-byte function body: aligned to std::bit_ceil(3) = 4
nop
nop
nop
.Lf1_end:
.prefalign 16, .Lf2_end, nop
# 32-byte function body: aligned to 16
...
.Lf2_end:
```
**Implementation notes**
- The directive creates a new fragment type whose size is determined
iteratively during the relaxation loop, with the body-size-dependent rule.
- The fill operand is required to make the intent explicit (NOP fill for code,
zero/byte fill for data).
- For targets with linker relaxation (e.g. RISC-V), `.prefalign` padding is
fully resolved at assembly time and does not require `R_RISCV_ALIGN`-style
relocations.
**References**
- LLVM RFC:
https://discourse.llvm.org/t/rfc-enhancing-function-alignment-attributes/88019
- Blog post with detailed analysis:
https://maskray.me/blog/2025-08-24-understanding-alignment-from-source-to-object-file
--
You are receiving this mail because:
You are on the CC list for the bug.