llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT--> @llvm/pr-subscribers-llvm-mc Author: Zachary Yedidia (zyedidia) <details> <summary>Changes</summary> This is the third patch in the LFI series, adding the AArch64-specific MCLFIRewriter implementation. The rewriter performs instruction-level sandboxing at the MC layer to enforce Lightweight Fault Isolation guarantees. The rewriter handles the following categories of instructions: * Memory accesses (loads/stores): rewritten to use sandboxed addressing via the base register (x27) with RoW (register offset with UXTW) optimization when possible, falling back to a two-instruction guard+access sequence. * Stack pointer modifications: redirected through a scratch register (x26) and then sandboxed back into SP. * Link register modifications: deferred guard emission until the next control flow instruction for PAC compatibility. * Indirect branches, calls, and returns: target addresses are sandboxed. * PAC authenticated branches/returns: expanded to their component operations (authenticate + guard + branch). * System instructions: SVC, MRS/MSR TPIDR_EL0, and DC ZVA are rewritten to use LFI conventions. * Pre/post-index addressing: decomposed into base access + separate offset update. Additional features: * Guard elimination optimization that avoids redundant sandboxing when consecutive memory accesses use the same base register. * TSFlags-based MemOpAddrMode annotations for classifying instruction addressing modes, used by the rewriter to determine how to sandbox each instruction. This may be generally useful beyond LFI. There is a separate unit-test for this feature. * Configurable sandboxing modes: +no-lfi-loads (stores-only) and +no-lfi-loads,+no-lfi-stores (jumps-only). * Documentation updates covering the rewriting rules, context register layout, and all LFI conventions. * Libunwind modifications to preserve LFI reserved registers during unwinding. Part of the MCLFIRewriter interface from the previous PR in the series needed to be expanded to allow handling deferring LR guards for PAC support. The rewriter has been tested on the LLVM test suite (using the subset of tests that compile with Musl, see https://github.com/lfi-project/llvm-test-suite). If you run programs compiled with this patch with `lfi-run`, make sure it is built with the `ctxreg=true` option, which will become the default in the future. ## Questions for reviewers * TSFlags MemAddr info vs. switch tables: the rewriter uses TSFlags to determine addressing modes for memory instructions. Is this the right approach, or would switch tables over opcodes be preferred? We could also split this part of the PR out into a separate PR if desired. * Should we split compiler-rt/libunwind modifications into separate PR? * SME/SVE ops with addressing modes that aren't covered: some SME/SVE addressing modes (VL-scaled, scatter/gather, tile slices) don't map to existing MemOpAddrMode values. Currently these are considered out-of-scope for LFI, so I didn't add MemOpAddrMode values for them. Is this acceptable, or should they be handled (at least in the MemOpAddrMode tracking) before landing? --- Patch is 186.31 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/184277.diff 41 Files Affected: - (modified) libunwind/src/DwarfInstructions.hpp (+1-1) - (modified) libunwind/src/UnwindRegistersRestore.S (+6) - (modified) llvm/docs/LFI.rst (+186-73) - (modified) llvm/include/llvm/MC/MCLFIRewriter.h (+5-1) - (modified) llvm/lib/MC/MCAsmStreamer.cpp (+3) - (modified) llvm/lib/MC/MCLFIRewriter.cpp (+4) - (modified) llvm/lib/MC/MCObjectStreamer.cpp (+3) - (modified) llvm/lib/MC/MCStreamer.cpp (+1-1) - (modified) llvm/lib/Target/AArch64/AArch64Features.td (+7-1) - (modified) llvm/lib/Target/AArch64/AArch64InstrFormats.td (+181-2) - (modified) llvm/lib/Target/AArch64/AArch64InstrInfo.cpp (+12-2) - (modified) llvm/lib/Target/AArch64/AArch64TargetMachine.cpp (+5) - (added) llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCLFIRewriter.cpp (+2070) - (added) llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCLFIRewriter.h (+144) - (modified) llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCTargetDesc.cpp (+15) - (modified) llvm/lib/Target/AArch64/MCTargetDesc/CMakeLists.txt (+1) - (modified) llvm/lib/Target/AArch64/SMEInstrFormats.td (+5) - (modified) llvm/lib/Target/AArch64/Utils/AArch64BaseInfo.h (+69) - (added) llvm/test/MC/AArch64/LFI/branch.s (+20) - (added) llvm/test/MC/AArch64/LFI/exclusive.s (+140) - (added) llvm/test/MC/AArch64/LFI/fp.s (+204) - (added) llvm/test/MC/AArch64/LFI/guard-elim.s (+149) - (added) llvm/test/MC/AArch64/LFI/jumps-only.s (+41) - (added) llvm/test/MC/AArch64/LFI/literal.s (+32) - (added) llvm/test/MC/AArch64/LFI/lse.s (+166) - (added) llvm/test/MC/AArch64/LFI/mem.s (+437) - (added) llvm/test/MC/AArch64/LFI/no-lfi-loads.s (+33) - (added) llvm/test/MC/AArch64/LFI/other.s (+6) - (added) llvm/test/MC/AArch64/LFI/pac.s (+55) - (added) llvm/test/MC/AArch64/LFI/prefetch.s (+81) - (added) llvm/test/MC/AArch64/LFI/rcpc.s (+19) - (added) llvm/test/MC/AArch64/LFI/reserved.s (+45) - (added) llvm/test/MC/AArch64/LFI/return.s (+72) - (added) llvm/test/MC/AArch64/LFI/simd.s (+472) - (added) llvm/test/MC/AArch64/LFI/stack.s (+37) - (added) llvm/test/MC/AArch64/LFI/sys.s (+15) - (added) llvm/test/MC/AArch64/LFI/tls-reg.s (+13) - (added) llvm/test/MC/AArch64/LFI/unsupported/literal.s (+26) - (added) llvm/test/MC/AArch64/LFI/unsupported/pac.s (+13) - (modified) llvm/unittests/Target/AArch64/CMakeLists.txt (+1) - (added) llvm/unittests/Target/AArch64/MemOpAddrModeTest.cpp (+158) ``````````diff diff --git a/libunwind/src/DwarfInstructions.hpp b/libunwind/src/DwarfInstructions.hpp index 165c4a99e9a92..32bde2e04ce03 100644 --- a/libunwind/src/DwarfInstructions.hpp +++ b/libunwind/src/DwarfInstructions.hpp @@ -226,7 +226,7 @@ int DwarfInstructions<A, R>::stepWithDwarf( // __unw_step_stage2 is not used for cross unwinding, so we use // __aarch64__ rather than LIBUNWIND_TARGET_AARCH64 to make sure we are // building for AArch64 natively. -#if defined(__aarch64__) +#if defined(__aarch64__) && !defined(__LFI__) if (stage2 && cieInfo.mteTaggedFrame) { pint_t sp = registers.getSP(); pint_t p = sp; diff --git a/libunwind/src/UnwindRegistersRestore.S b/libunwind/src/UnwindRegistersRestore.S index 76a80344034f7..a700ed7ce9f47 100644 --- a/libunwind/src/UnwindRegistersRestore.S +++ b/libunwind/src/UnwindRegistersRestore.S @@ -678,9 +678,15 @@ DEFINE_LIBUNWIND_FUNCTION(__libunwind_Registers_arm64_jumpto) ldp x18,x19, [x0, #0x090] ldp x20,x21, [x0, #0x0A0] ldp x22,x23, [x0, #0x0B0] +#ifndef __LFI__ ldp x24,x25, [x0, #0x0C0] ldp x26,x27, [x0, #0x0D0] ldp x28,x29, [x0, #0x0E0] +#else + ldp x24,xzr, [x0, #0x0C0] + ldp x26,xzr, [x0, #0x0D0] + ldp xzr,x29, [x0, #0x0E0] +#endif #if defined(__ARM_FP) && __ARM_FP != 0 ldp d0, d1, [x0, #0x110] diff --git a/llvm/docs/LFI.rst b/llvm/docs/LFI.rst index 65d8b70f17e0b..58542266388fe 100644 --- a/llvm/docs/LFI.rst +++ b/llvm/docs/LFI.rst @@ -63,15 +63,15 @@ to be applied to hand-written assembly, including inline assembly. Compiler Options ================ -The LFI target has several configuration options. +The LFI target has several configuration options, specified via ``-mattr=``: -* ``+lfi-loads``: enable sandboxing for loads (default: true). -* ``+lfi-stores``: enable sandboxing for stores (default: true). +* ``+no-lfi-loads``: Disable sandboxing for load instructions (stores-only mode). +* ``+no-lfi-stores``: Disable sandboxing for store instructions. -Use ``+nolfi-loads`` to create a "stores-only" sandbox that may read, but not +Use ``+no-lfi-loads`` to create a "stores-only" sandbox that may read, but not write, outside the sandbox region. -Use ``+nolfi-loads+nolfi-stores`` to create a "jumps-only" sandbox that may +Use ``+no-lfi-loads,+no-lfi-stores`` to create a "jumps-only" sandbox that may read/write outside the sandbox region but may not transfer control outside (e.g., may not execute system calls directly). This is primarily useful in combination with some other form of memory sandboxing, such as Intel MPK. @@ -88,7 +88,23 @@ that must be maintained. * ``sp``: always holds an address within the sandbox. * ``x30``: always holds an address within the sandbox. * ``x26``: scratch register. -* ``x25``: points to a thread-local virtual register file for storing runtime context information. +* ``x25``: context register (see below). + +Context Register +~~~~~~~~~~~~~~~~ + +The context register (``x25``) points to a block of thread-local memory managed +by the LFI runtime. The layout is as follows: + ++--------+--------+----------------------------------------------+ +| Offset | Size | Description | ++--------+--------+----------------------------------------------+ +| 0 | 8 | Reserved for use by the LFI runtime. | ++--------+--------+----------------------------------------------+ +| 8 | 24 | Reserved for future use. | ++--------+--------+----------------------------------------------+ +| 32 | 8 | Virtual thread pointer (used for TLS access).| ++--------+--------+----------------------------------------------+ Linker Support ============== @@ -240,73 +256,178 @@ before moving it back into ``sp`` with a safe ``add``. Link register modification ~~~~~~~~~~~~~~~~~~~~~~~~~~~ -When the link register is modified, we write the modified value to a -temporary, before loading it back into ``x30`` with a safe ``add``. - -+-----------------------+----------------------------+ -| Original | Rewritten | -+-----------------------+----------------------------+ -| .. code-block:: | .. code-block:: | -| | | -| ldr x30, [...] | ldr x26, [...] | -| | add x30, x27, w26, uxtw | -| | | -+-----------------------+----------------------------+ -| .. code-block:: | .. code-block:: | -| | | -| ldp xN, x30, [...] | ldp xN, x26, [...] | -| | add x30, x27, w26, uxtw | -| | | -+-----------------------+----------------------------+ -| .. code-block:: | .. code-block:: | -| | | -| ldp x30, xN, [...] | ldp x26, xN, [...] | -| | add x30, x27, w26, uxtw | -| | | -+-----------------------+----------------------------+ +When the link register is modified, the guard is deferred until the next +control flow instruction. This approach maintains compatibility with Pointer +Authentication Code (PAC) instructions by keeping signed pointers intact until +they are needed for control flow. The guard uses ``x30`` as both the source and +destination (``add x30, x27, w30, uxtw``). + ++---------------------------+-------------------------------+ +| Original | Rewritten | ++---------------------------+-------------------------------+ +| .. code-block:: | .. code-block:: | +| | | +| ldr x30, [...] | ldr x30, [...] | +| ret | add x30, x27, w30, uxtw | +| | ret | +| | | ++---------------------------+-------------------------------+ +| .. code-block:: | .. code-block:: | +| | | +| ldp xN, x30, [...] | ldp xN, x30, [...] | +| ret | add x30, x27, w30, uxtw | +| | ret | +| | | ++---------------------------+-------------------------------+ + +Pointer Authentication Code (PAC) support +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +LFI is designed to be compatible with ARM Pointer Authentication Code (PAC) +instructions. PAC signs and authenticates pointers (typically the return +address in ``x30``) to protect against control-flow hijacking attacks. + +To get the security benefits of PAC with LFI-compiled code, the hardware must +support **FEAT_FPAC** (Faulting PAC), which causes authentication failures to +immediately fault. Without FEAT_FPAC, a failed authentication produces a +"poisoned" pointer that only faults when dereferenced, which may not provide +immediate detection of authentication failures. + +**PACIASP** (sign return address) passes through unchanged. It signs the +current value of ``x30`` using the stack pointer as a modifier, which does not +affect LFI's security guarantees. + +**AUTIASP** (authenticate return address) passes through unchanged. On +processors with FEAT_FPAC, authentication failure automatically faults. + ++-------------------+------------------------+ +| Original | Rewritten | ++-------------------+------------------------+ +| .. code-block:: | .. code-block:: | +| | | +| paciasp | paciasp | +| | | ++-------------------+------------------------+ +| .. code-block:: | .. code-block:: | +| | | +| autiasp | autiasp | +| | | ++-------------------+------------------------+ + +Note that the deferred LR guard approach is essential for PAC compatibility. +If the guard were applied immediately after loading a signed return address, +it would corrupt the PAC signature, causing subsequent ``autiasp`` to fail. +By deferring the guard until control flow, signed pointers remain intact +through the authentication process. + +**Authenticated returns** (``retaa``/``retab``) combine authentication with +return. LFI expands these into their component operations: + ++-------------------+-------------------------------+ +| Original | Rewritten | ++-------------------+-------------------------------+ +| .. code-block:: | .. code-block:: | +| | | +| retaa | autiasp | +| | add x30, x27, w30, uxtw | +| | ret | +| | | ++-------------------+-------------------------------+ +| .. code-block:: | .. code-block:: | +| | | +| retab | autibsp | +| | add x30, x27, w30, uxtw | +| | ret | +| | | ++-------------------+-------------------------------+ + +**Authenticated branches** (``braa``/``brab``/``braaz``/``brabz``) combine +authentication with indirect branch. LFI expands these by first authenticating +the target register, then performing a normal sandboxed branch: + ++-------------------+-------------------------------+ +| Original | Rewritten | ++-------------------+-------------------------------+ +| .. code-block:: | .. code-block:: | +| | | +| braa xN, xM | autia xN, xM | +| | add x28, x27, wN, uxtw | +| | br x28 | +| | | ++-------------------+-------------------------------+ +| .. code-block:: | .. code-block:: | +| | | +| braaz xN | autiza xN | +| | add x28, x27, wN, uxtw | +| | br x28 | +| | | ++-------------------+-------------------------------+ + +**Authenticated calls** (``blraa``/``blrab``/``blraaz``/``blrabz``) are +expanded similarly: + ++-------------------+-------------------------------+ +| Original | Rewritten | ++-------------------+-------------------------------+ +| .. code-block:: | .. code-block:: | +| | | +| blraa xN, xM | autia xN, xM | +| | add x28, x27, wN, uxtw | +| | blr x28 | +| | | ++-------------------+-------------------------------+ +| .. code-block:: | .. code-block:: | +| | | +| blraaz xN | autiza xN | +| | add x28, x27, wN, uxtw | +| | blr x28 | +| | | ++-------------------+-------------------------------+ + +**Authenticated exception returns** (``eretaa``/``eretab``) are not supported +by LFI and will produce an error. System instructions ~~~~~~~~~~~~~~~~~~~ System calls are rewritten into a sequence that loads the address of the first runtime call entrypoint and jumps to it. The runtime call entrypoint table is -stored at the start of the sandbox, so it can be referenced by ``x27``. The -rewrite also saves and restores the link register, since it is used for -branching into the runtime. - -+-----------------+----------------------------+ -| Original | Rewritten | -+-----------------+----------------------------+ -| .. code-block:: | .. code-block:: | -| | | -| svc #0 | mov w26, w30 | -| | ldr x30, [x27] | -| | blr x30 | -| | add x30, x27, w26, uxtw | -| | | -+-----------------+----------------------------+ +stored at a negative offset from the sandbox base, so it can be referenced by +``x27``. The rewrite also saves and restores the link register, since it is +used for branching into the runtime. + ++-----------------+------------------------------+ +| Original | Rewritten | ++-----------------+------------------------------+ +| .. code-block:: | .. code-block:: | +| | | +| svc #0 | mov x26, x30 | +| | ldur x30, [x27, #-8] | +| | blr x30 | +| | add x30, x27, w26, uxtw | +| | | ++-----------------+------------------------------+ Thread-local storage ~~~~~~~~~~~~~~~~~~~~ -TLS accesses are rewritten into accesses offset from ``x25``, which is a -reserved register that points to a virtual register file, with a location for -storing the sandbox's thread pointer. ``TP`` is the offset into that virtual -register file where the thread pointer is stored. - -+----------------------+-----------------------+ -| Original | Rewritten | -+----------------------+-----------------------+ -| .. code-block:: | .. code-block:: | -| | | -| mrs xN, tpidr_el0 | ldr xN, [x25, #TP] | -| | | -+----------------------+-----------------------+ -| .. code-block:: | .. code-block:: | -| | | -| mrs tpidr_el0, xN | str xN, [x25, #TP] | -| | | -+----------------------+-----------------------+ +TLS accesses are rewritten into loads/stores from the context register +(``x25``), which holds the virtual thread pointer at offset 32 (see +`Context Register`_). + ++----------------------+-------------------------+ +| Original | Rewritten | ++----------------------+-------------------------+ +| .. code-block:: | .. code-block:: | +| | | +| mrs xN, tpidr_el0 | ldr xN, [x25, #32] | +| | | ++----------------------+-------------------------+ +| .. code-block:: | .. code-block:: | +| | | +| msr tpidr_el0, xN | str xN, [x25, #32] | +| | | ++----------------------+-------------------------+ Optimizations ============= @@ -335,22 +456,14 @@ can be removed. Address generation ~~~~~~~~~~~~~~~~~~ +**Note**: this optimization has not been implemented. + Addresses to global symbols in position-independent executables are frequently generated via ``adrp`` followed by ``ldr``. Since the address generated by ``adrp`` can be statically guaranteed to be within the sandbox, it is safe to directly target ``x28`` for these sequences. This allows the omission of a guard instruction before the ``ldr``. -+----------------------+-----------------------+ -| Original | Rewritten | -+----------------------+-----------------------+ -| .. code-block:: | .. code-block:: | -| | | -| adrp xN, target | adrp x28, target | -| ldr xN, [xN, imm] | ldr xN, [x28, imm] | -| | | -+----------------------+-----------------------+ - Stack guard elimination ~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/llvm/include/llvm/MC/MCLFIRewriter.h b/llvm/include/llvm/MC/MCLFIRewriter.h index 90f8a9b0e0c09..95972202a25c6 100644 --- a/llvm/include/llvm/MC/MCLFIRewriter.h +++ b/llvm/include/llvm/MC/MCLFIRewriter.h @@ -41,6 +41,7 @@ class MCLFIRewriter { : Ctx(Ctx), InstInfo(std::move(II)), RegInfo(std::move(RI)) {} LLVM_ABI void error(const MCInst &Inst, const char Msg[]); + LLVM_ABI void warning(const MCInst &Inst, const char Msg[]); void disable() { Enabled = false; } void enable() { Enabled = true; } @@ -61,7 +62,10 @@ class MCLFIRewriter { // Called when a label is emitted. Used for optimizations that require // information about jump targets, such as guard elimination. - virtual void onLabel(const MCSymbol *Symbol) {} + virtual void onLabel(const MCSymbol *Symbol, MCStreamer &Out) {} + + // Called at the end of the stream to flush any pending state. + virtual void finish(MCStreamer &Out) {} }; } // namespace llvm diff --git a/llvm/lib/MC/MCAsmStreamer.cpp b/llvm/lib/MC/MCAsmStreamer.cpp index 1a50ae43cd9c9..cea014effa121 100644 --- a/llvm/lib/MC/MCAsmStreamer.cpp +++ b/llvm/lib/MC/MCAsmStreamer.cpp @@ -2581,6 +2581,9 @@ void MCAsmStreamer::emitRawTextImpl(StringRef String) { } void MCAsmStreamer::finishImpl() { + if (LFIRewriter) + LFIRewriter->finish(*this); + // If we are generating dwarf for assembly source files dump out the sections. if (getContext().getGenDwarfForAssembly()) MCGenDwarfInfo::Emit(this); diff --git a/llvm/lib/MC/MCLFIRewriter.cpp b/llvm/lib/MC/MCLFIRewriter.cpp index 0ffbc02689aa2..61e64988cd041 100644 --- a/llvm/lib/MC/MCLFIRewriter.cpp +++ b/llvm/lib/MC/MCLFIRewriter.cpp @@ -23,6 +23,10 @@ void MCLFIRewriter::error(const MCInst &Inst, const char Msg[]) { Ctx.reportError(Inst.getLoc(), Msg); } +void MCLFIRewriter::warning(const MCInst &Inst, const char Msg[]) { + Ctx.reportWarning(Inst.getLoc(), Msg); +} + bool MCLFIRewriter::isCall(const MCInst &Inst) const { return InstInfo->get(Inst.getOpcode()).isCall(); } diff --git a/llvm/lib/MC/MCObjectStreamer.cpp b/llvm/lib/MC/MCObjectStreamer.cpp index 58aa7945d7393..48eb6b6186dec 100644 --- a/llvm/lib/MC/MCObjectStreamer.cpp +++ b/llvm/lib/MC/MCObjectStreamer.cpp @@ -791,6 +791,9 @@ void MCObjectStreamer::emitAddrsigSym(const MCSymbol *Sym) { } void MCObjectStreamer::finishImpl() { + if (LFIRewriter) + LFIRewriter->finish(*this); + getContext().RemapDebugPaths(); // If we are generating dwarf for assembly source files dump out the sections. diff --git a/llvm/lib/MC/MCStreamer.cpp b/llvm/lib/MC/MCStreamer.cpp index 33c9a05bec114..685e82d6a3633 100644 --- a/llvm/lib/MC/MCStreamer.cpp +++ b/llvm/lib/MC/MCStreamer.cpp @@ -400,7 +400,7 @@ void MCStreamer::emitLabel(MCSymbol *Symbol, SMLoc Loc) { Symbol->setFragment(&getCurrentSectionOnly()->getDummyFragment()); if (LFIRewriter) - LFIRewriter->onLabel(Symbol); + LFIRewriter->onLabel(Symbol, *this); MCTargetStreamer *TS = getTargetStreamer(); if (TS) diff --git a/llvm/lib/Target/AArch64/AArch64Features.td b/llvm/lib/Target/AArch64/AArch64Features.td index faee640a910d0..c49658510bfbb 100644 --- a/llvm/lib/Target/AArch64/AArch64Features.td +++ b/llvm/lib/Target/AArch64/AArch64Features.td @@ -1060,7 +1060,13 @@ def FeatureHardenSlsNoComdat : SubtargetFeature<"harden-sls-nocomdat", "HardenSlsNoComdat", "true", "Generate thunk code for SLS mitigation in the normal text section">; - +// LFI (Lightweight Fault Isolation) features. +// By default, both loads and stores are sandboxed. Use +no-lfi-loads for +// stores-only mode, or +no-lfi-loads+no-lfi-stores for jumps-only mode. +def FeatureNoLFILoads : SubtargetFeature<"no-lfi-loads", "NoLFILoads", "true", + "Disable LFI sandboxing for load instructions (stores-only mode)">; +def FeatureNoLFIStores : SubtargetFeature<"no-lfi-stores", "NoLFIStores", "true", + "Disable LFI s... [truncated] `````````` </details> https://github.com/llvm/llvm-project/pull/184277 _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
