On Thu, 13 Nov 2025, Evgeny Karpov wrote:

From: Evgeny Karpov <[email protected]>

Is this the intended email address to use for these contributions - no more @microsoft.com?

Subject: [PATCH] aarch64: Add runtime relocations

The patch implements the required changes to support runtime relocations.
For 26-bit relocation, the linker generates a jump stub, as a single
opcode is not sufficient for relocation.

The supporting binutils patch is being upstreamed.
https://sourceware.org/pipermail/binutils/2025-November/145651.html

A similar change has been upstreamed to Cygwin.
https://cygwin.com/pipermail/cygwin-patches/2025q4/014332.html

Signed-off-by: Evgeny Karpov <[email protected]>
---
 mingw-w64-crt/crt/pseudo-reloc.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/mingw-w64-crt/crt/pseudo-reloc.c b/mingw-w64-crt/crt/pseudo-reloc.c
index dd08e718a..20c04a1f0 100644
--- a/mingw-w64-crt/crt/pseudo-reloc.c
+++ b/mingw-w64-crt/crt/pseudo-reloc.c
@@ -464,6 +464,31 @@ do_pseudo_reloc (void * start, void * end, void * base)
         case 16:
            __write_memory ((void *) reloc_target, &reldata, 2);
           break;
+#ifdef __aarch64__
+        case 12:
+          /* Replace add Xn, Xn, :lo12:label with ldr Xn, [Xn, 
:lo12:__imp__func].
+             That loads the address of _func into Xn.  */
+          opcode = 0xf9400000 | (opcode & 0x3ff); // ldr
+          reldata = ((ptrdiff_t) base + r->sym) & ((1 << 12) - 1);
+          reldata >>= 3;
+          opcode |= reldata << 10;
+           __write_memory ((void *) reloc_target, &opcode, 4);
+          break;
+        case 21:
+          /* Replace adrp Xn, label with adrp Xn, __imp__func.  */
+          opcode &= 0x9f00001f;
+          reldata = (((ptrdiff_t) base + r->sym) >> 12)
+                    - (((ptrdiff_t) base + r->target) >> 12);
+          reldata &= (1 << 21) - 1;
+          opcode |= (reldata & 3) << 29;
+          reldata >>= 2;
+          opcode |= reldata << 5;
+           __write_memory ((void *) reloc_target, &opcode, 4);
+          break;
+        /* A note regarding 26 bits relocation.
+           A single opcode is not sufficient for 26 bits relocation in dynamic 
linking.
+           The linker generates a jump stub instead.  */
+#endif

First off - I would point out that I did consider doing something like this for the case with LLVM/Clang as well, but I decided not to.

Instead, in LLVM/Clang, we instead generate .refptr indirection - just like GCC also does on x86_64. Doing that avoids a number of problems:

- It avoids having to add support for these new relocations here (including the nitpicky details I'll follow up with below)

- It avoids having to do these relocations in the .text section. Doing that requires changing the permission of the code section to write+execute, which generally is undesireable. (Plus, in special environments such as UWP, it is entirely forbidden to have regions of memory being both writable and executable at the same time.)

- It avoids the issue with how far away the target symbol can be. E.g. on x86_64, a 32 bit relative address isn't big enough if the target is too far away. When this issue does show up, it produces extremely confusing issues, so to help diagnose it better, we added a check here in the the pseudo relocation code (see commit ca35236d9799af8a3d2f9baa35b60e6c11abeb24) to error out if the target is too far away to express in the given number of bits.

That said, I see that you've worked around the range issue by rewriting "add Xn, Xn, :lo12:label" into "ldr Xn, [xn, :lo12:__imp_label]" - which makes the range problem a non-issue. That's neat!

Regarding range, did you actually test this in a mingw setting? The range check code, currently on line 442-456, should trigger on these relocations (with bits == 12 or 21) and error out, even if you actually do handle a larger range. If you want to go this way of this patch, I'm pretty sure you need to patch the range check as well, to make it not trigger on these relocations.

However - there are two potential flaws with this approach which I don't see how you are solving. (Do you have a full prebuilt toolchain with these patches integrated where I could try it out? I tried https://github.com/Windows-on-ARM-Experiments/mingw-woarm64-build/releases/download/2025-07-15/aarch64-w64-mingw32-msvcrt-toolchain.tar.gz but that doesn't seem to have these bits enabled yet.)

What you have now works fine for code like this:

    extern int variable;
    int *get_var_addr(void) {
        return &variable;
    }

Where GCC generates code like this:

    get_var_addr:
        adrp    x0, variable
        add     x0, x0, :lo12:variable
        ret

If the linker part of these relocations work in the same way as for the other existing pseudo relocations on x86, then the linker replaces the symbol references to the undefined "variable" into "__imp_variable" at linking time like this:

    get_var_addr:
        adrp    x0, __imp_variable
        add     x0, x0, :lo12:__imp_variable
        ret

Now this works fine with your pseudo relocation handling, which at runtime turns it into this:

    get_var_addr:
        adrp    x0, __imp_variable
        ldr     x0, [x0, :lo12:__imp_variable]
        ret

However, what does it do about this case?

    extern int variable;
    int get_var(void) {
        return variable;
    }

With the current versions of GCC, this generates the following code:

    get_var:
        adrp    x0, variable
        ldr     w0, [x0, #:lo12:variable]
        ret

Now in this case, we already have the :lo12: relocation in an ldr instruction, so the pseudo relocation trick no longer works as intended. To fix this case, the pseudo relocation handling code would need to insert an extra ldr instruction after this one.


The secondly, the current mechanism for the pseudo relocations work by _adding_ the difference between the __imp_variable and the actual imported address to the relocation. Not overwriting, but adding. This makes it also work transparently for relocations with a PIC-relative address (although that's range limited). It also makes it work for cases where the symbol reference has a built-in offset.

As for concrete examples to show the issue:

    struct S {
        int a, b, c, d;
    };
    extern struct S s;
    int get_field(void) {
        return s.d;
    }
    int *get_field_addr(void) {
        return &s.d;
    }

Currently with GCC, this produces the following code:

    get_field:
        adrp    x0, s+12
        ldr     w0, [x0, #:lo12:s+12]
        ret

    get_field_addr:
        adrp    x0, s+12
        add     x0, x0, :lo12:s+12
        ret

Now if I understand the code you're proposing correctly, this would lose and drop the +12 offset entirely, and just end up addressing the start of the struct instead.

I guess it's possible to somehow try to work around these issues by making GCC not emit this kind of code at all - to never bake in an offset like this, and never generate a direct "ldr" like in the get_addr and get_field cases. (Then you should also amend the linker to check that the symbol offset, when creating such pseudo relocations, has to be zero, as it won't work in the end otherwise.)

So it may be possible to hack around all these issues somehow, but I would suggest instead going the same way as GCC did for x86_64, and we've done in LLVM/Clang for all architectures, by doing indirection via a .refptr pointer instead. That way, you don't need _any_ code changes to the linker or runtime. (Other than adding support for pseudo relocations for autoimport of full 64 bit addresses, if that needs architecture specific code in binutils.)

// Martin



_______________________________________________
Mingw-w64-public mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

Reply via email to