https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116718
achraf.belrch at gmail dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |achraf.belrch at gmail dot com
--- Comment #3 from achraf.belrch at gmail dot com ---
I've been digging around this for a couple of weeks now while getting familiar
with the codebase. I would like to work on it if it's still open (looking at
the recent patches that seems to be the case).
I plan to send a short RFC to [email protected] before writing the bulk of the
code, meanwhile here's a short summary of what've found so far.
1. define a new "bpf_fastcall" entry in bpf_attribute_table, as a
*type* attribute on function types with affects_type_identity, following
the aarch64_vector_pcs precedent.
2. implement TARGET_FNTYPE_ABI returning predefined_function_abi
descriptors whose clobber sets are derived from the signature. The
signature-derived sets number 12 (r0 iff non-void, r1..rN for N
params), of which the non-void five-param set is the default ABI.
So with NUM_ABI_IDS == 12 full per-signature precision would fit exactly,
consuming all 11 free ids[1].
3. finally a new TARGET_MACHINE_DEPENDENT_REORG. After RA (and
after compute_bb_for_insn(), since pass_free_cfg precedes machine
reorg), walk each block backwards using df_simulate_*_backwards; at
every call (the scan self-neutralizes for default-ABI calls, since no
caller-saved register can be live across them), bracket each live
register the call's ABI preserves with DImode reg<->mem moves at fresh
r10-relative offsets below local_vars_size. These match the existing
move patterns and assemble to stxdw/ldxdw shapes the kernel wants.
[1] re. 2. there's an optimization where we can fold r0 into every clobber set
at the cost of 1 register for `void` callees since every `fastcall` enabled
kfunc returns a value. This spares 6 ids for future bpf abi variants.