On Mon, Jun 23, 2025 at 07:12:58PM -0600, Jeff Law wrote: > > > On 5/20/25 1:22 AM, Stefan Schulze Frielinghaus wrote: > > Currently a register asm already materializes during expand. This > > means, a hard register is allocated for the very first access of a > > register asm as e.g. in an assignment. As a consequence this might lead > > to suboptimal register allocation if the assignment and the using asm > > statement are spread far apart. Even more problematic are function > > calls in between register asm assignments and its using asm statement > > since hard registers may be clobbered by a call. The former may be > > solved by pulling register asm assignments and asm statements close by. > > However, the latter is not easily solved since sometimes function calls > > are implicit. For example > > > > int > > foo (int *x) > > { > > register int y asm ("0") = 42; > > register int z asm ("1") = *x; > > asm ("bar\t%0,%1" : "+r" (z) : "r" (y)); > > return z; > > } > > > > If compiled with address sanitizer, then a function call is introduced > > for the memory load which in turn may interfer with the initialization > > of register asm y. Likewise, for some targets and configurations even > > an operation like an addition may lead to an implicit library call. > > > > In contrast hard register constraints materialize during register > > allocation and therefore do not suffer from this, i.e., asm operands are > > kept in pseudos until RA. This patch adds the feature of rewriting > > local register asm into code which exploits hard register constraints. > > For example > > > > register int global asm ("r3"); > > > > int foo (int x0) > > { > > register int x asm ("r4") = x0; > > register int y asm ("r5"); > > > > asm ("bar\t%0,%1,%2" : "=r" (x) : "0" (x), "r" (global)); > > x += 42; > > asm ("baz\t%0,%1" : "=r" (y) : "r" (x)); > > > > return y; > > } > > > > is rewritten during gimplification into > > > > register int global asm ("r3"); > > > > int foo (int x0) > > { > > int x = x0; > > int y; > > > > asm ("bar\t%0,%1,%2" : "={r4}" (x) : "0" (x), "r" (global)); > > x += 42; > > asm ("baz\t%0,%1" : "={r5}" (y) : "{r4}" (x)); > > > > return y; > > } > > > > The resulting code solely relies on hard register constraints modulo > > global register asm. > > > > Since I consider this as an experimental feature it is hidden behind new > > flag -fdemote-register-asm (I'm open for other naming suggestions). > > --- > > gcc/common.opt | 4 + > > gcc/gimplify.cc | 87 +++++++++++++++++++ > > .../gcc.dg/asm-hard-reg-demotion-1.c | 19 ++++ > > .../gcc.dg/asm-hard-reg-demotion-2.c | 19 ++++ > > .../gcc.dg/asm-hard-reg-demotion-error-1.c | 29 +++++++ > > gcc/testsuite/gcc.dg/asm-hard-reg-demotion.h | 52 +++++++++++ > > 6 files changed, 210 insertions(+) > > create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-demotion-1.c > > create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-demotion-2.c > > create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-demotion-error-1.c > > create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-demotion.h > So essentially this is range shrinking if I'm reading it correctly and I > would generally consider that a good thing, particularly WRT registers > appearing in ASMs regardless of the precise form. > > As you note, there are oddball cases where it's not obvious where a call > might appear. We don't support it anymore, but the mn102, IIRC did libcalls > for certain extensions and those libcalls wheren't exposed as such. > Similarly the PA has millicode calls for div/mod and perhaps other things I > can't remember anymore. I *think* they're represented as simple div/mod > instructions in the IL (it's been decades since I worked on either of those > targets, so if the exact details are wrong, forgive me)... > > I don't know if the kernel uses ASMs in a way that would trigger this, but > it might be a good testcase. Similarly for glibc.
As written in the other reply for patch 2/4 I will do some more testing including building glibc+kernel. I will include the results in my (final?!) post of this patch series. > > I note that you don't do any demotion for clobbers, is that intentional? Or > am I mistaken on the implementation details? I didn't do any demotion of clobbers since I didn't see any value in it. If a clobbered register gets accidentally clobbered as e.g. by an implicitly introduced call, I wouldn't mind. Again, thanks for reviewing all this :) Stefan > > jeff > > > > > > > diff --git a/gcc/common.opt b/gcc/common.opt > > index 0e50305dde8..34570f7dc4a 100644 > > --- a/gcc/common.opt > > +++ b/gcc/common.opt > > @@ -3514,6 +3514,10 @@ fverbose-asm > > Common Var(flag_verbose_asm) > > Add extra commentary to assembler output. > > +fdemote-register-asm > > +Common Var(flag_demote_register_asm) Init(0) > > +Demote local register asm and use hard register constraints instead. > > + > > fvisibility= > > Common Joined RejectNegative Enum(symbol_visibility) > > Var(default_visibility) Init(VISIBILITY_DEFAULT) > > -fvisibility=[default|internal|hidden|protected] Set the default symbol > > visibility. > > diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc > > index 7f66d4f0be6..1bcd63db4ef 100644 > > --- a/gcc/gimplify.cc > > +++ b/gcc/gimplify.cc > > @@ -7859,6 +7859,72 @@ num_alternatives (const_tree link) > > return num + 1; > > } > > +/* Keep track of all register asm which have been replaced by hard register > > + constraints. After all asm statements of a function have been > > processed, > > + demote those to ordinary variables. */ > > +static hash_set<tree> demote_register_asm; > > + > > +static void > > +gimplify_demote_register_asm (tree link) > > +{ > > + tree op = TREE_VALUE (link); > > + if (!VAR_P (op) || !DECL_HARD_REGISTER (op) || is_global_var (op)) > > + return; > > + tree id = DECL_ASSEMBLER_NAME (op); > > + const char *regname = IDENTIFIER_POINTER (id); > > + ++regname; > > + int regno = decode_reg_name (regname); > > + if (regno < 0) > > + /* This indicates an error and we error out later on. */ > > + return; > > + const char *constraint > > + = TREE_STRING_POINTER (TREE_VALUE (TREE_PURPOSE (link))); > > + auto_vec<char, 64> constraint_new; > > + for (const char *p = constraint; *p; ) > > + { > > + bool pushed = false; > > + switch (*p) > > + { > > + case '+': case '=': case '%': case '?': case '!': case '*': case '&': > > + case '#': case '$': case '^': case '{': case 'E': case 'F': case 'G': > > + case 'H': case 's': case 'i': case 'n': case 'I': case 'J': case 'K': > > + case 'L': case 'M': case 'N': case 'O': case 'P': case ',': case '0': > > + case '1': case '2': case '3': case '4': case '5': case '6': case '7': > > + case '8': case '9': case '[': case '<': case '>': case 'g': case 'X': > > + break; > > + > > + default: > > + if (!ISALPHA (*p)) > > + break; > > + enum constraint_num cn = lookup_constraint (p); > > + enum reg_class rclass = reg_class_for_constraint (cn); > > + if (rclass != NO_REGS || insn_extra_address_constraint (cn)) > > + { > > + constraint_new.safe_push ('{'); > > + size_t len = strlen (regname); > > + for (size_t i = 0; i < len; ++i) > > + constraint_new.safe_push (regname[i]); > > + constraint_new.safe_push ('}'); > > + pushed = true; > > + } > > + break; > > + } > > + > > + for (size_t len = CONSTRAINT_LEN (*p, p); len; len--, p++) > > + { > > + if (!pushed) > > + constraint_new.safe_push (*p); > > + if (*p == '\0') > > + break; > > + } > > + } > > + constraint_new.safe_push ('\0'); > > + unsigned int len = constraint_new.length (); > > + tree str = build_string (len, constraint_new.address ()); > > + TREE_VALUE (TREE_PURPOSE (link)) = str; > > + demote_register_asm.add (op); > > +} > > + > > /* Gimplify the operands of an ASM_EXPR. Input operands should be a > > gimple > > value; output operands should be a gimple lvalue. */ > > @@ -8264,6 +8330,20 @@ gimplify_asm_expr (tree *expr_p, gimple_seq *pre_p, > > gimple_seq *post_p) > > /* Do not add ASMs with errors to the gimple IL stream. */ > > if (ret != GS_ERROR) > > { > > + if (flag_demote_register_asm) > > + { > > + for (unsigned i = 0; i < vec_safe_length (outputs); ++i) > > + { > > + tree link = (*outputs)[i]; > > + gimplify_demote_register_asm (link); > > + } > > + for (unsigned i = 0; i < vec_safe_length (inputs); ++i) > > + { > > + tree link = (*inputs)[i]; > > + gimplify_demote_register_asm (link); > > + } > > + } > > + > > stmt = gimple_build_asm_vec (TREE_STRING_POINTER (ASM_STRING > > (expr)), > > inputs, outputs, clobbers, labels); > > @@ -21023,6 +21103,13 @@ gimplify_body (tree fndecl, bool do_parms) > > } > > } > > + for (auto op : demote_register_asm) > > + { > > + DECL_REGISTER (op) = 0; > > + DECL_HARD_REGISTER (op) = 0; > > + } > > + demote_register_asm.empty (); > > + > > if ((flag_openacc || flag_openmp || flag_openmp_simd) > > && gimplify_omp_ctxp) > > { > > diff --git a/gcc/testsuite/gcc.dg/asm-hard-reg-demotion-1.c > > b/gcc/testsuite/gcc.dg/asm-hard-reg-demotion-1.c > > new file mode 100644 > > index 00000000000..541a66a8d05 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/asm-hard-reg-demotion-1.c > > @@ -0,0 +1,19 @@ > > +/* { dg-do run { target aarch64*-*-* powerpc64*-*-* riscv64-*-* s390*-*-* > > x86_64-*-* } } */ > > +/* { dg-options "-O2 -fdemote-register-asm" } */ > > + > > +#include "asm-hard-reg-demotion.h" > > + > > +int > > +main (void) > > +{ > > + if (bar (0) != 0 > > + || bar (1) != 1 > > + || bar (2) != 2 > > + || bar (32) != 32 > > + || baz (0) != 0 > > + || baz (1) != 1 > > + || baz (2) != 2 > > + || baz (32) != 32) > > + __builtin_abort (); > > + return 0; > > +} > > diff --git a/gcc/testsuite/gcc.dg/asm-hard-reg-demotion-2.c > > b/gcc/testsuite/gcc.dg/asm-hard-reg-demotion-2.c > > new file mode 100644 > > index 00000000000..3d216d440af > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/asm-hard-reg-demotion-2.c > > @@ -0,0 +1,19 @@ > > +/* { dg-do run { target aarch64*-*-* powerpc64*-*-* riscv64-*-* s390*-*-* > > x86_64-*-* } } */ > > +/* { dg-options "-O2 -fno-demote-register-asm" } */ > > + > > +#include "asm-hard-reg-demotion.h" > > + > > +int > > +main (void) > > +{ > > + if (bar (0) != 42 > > + || bar (1) != 42 > > + || bar (2) != 42 > > + || bar (32) != 42 > > + || baz (0) != 0 > > + || baz (1) != 1 > > + || baz (2) != 2 > > + || baz (32) != 32) > > + __builtin_abort (); > > + return 0; > > +} > > diff --git a/gcc/testsuite/gcc.dg/asm-hard-reg-demotion-error-1.c > > b/gcc/testsuite/gcc.dg/asm-hard-reg-demotion-error-1.c > > new file mode 100644 > > index 00000000000..097a6ff901d > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/asm-hard-reg-demotion-error-1.c > > @@ -0,0 +1,29 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-fdemote-register-asm" } */ > > + > > +/* Verify input operands. */ > > + > > +int > > +test (int x) > > +{ > > + register int y __asm__ ("0") = x; > > + > > + /* Preserve status quo for register asm and don't error out. This must > > hold > > + true even if we replace register asm with hard register constraints. > > */ > > + __asm__ ("" : "+r" (y) : "r" (y)); > > + __asm__ ("" : "=r" (y) : "0" (y), "r" (y)); > > + __asm__ ("" : "=r" (y) : "r" (y), "r" (y)); > > + > > + /* Be more strict for hard register constraints and error out. */ > > + __asm__ ("" : "+{0}" (x) : "{0}" (x)); /* { dg-error "multiple inputs to > > hard register" } */ > > + __asm__ ("" : "={0}" (x) : "0" (x), "{0}" (x)); /* { dg-error "multiple > > inputs to hard register" } */ > > + __asm__ ("" : "=r" (x) : "{0}" (x), "{0}" (x)); /* { dg-error "multiple > > inputs to hard register" } */ > > + > > + /* Still error out in case of a mixture of both constructs. */ > > + __asm__ ("" : "+{0}" (x) : "r" (y)); /* { dg-error "multiple inputs to > > hard register" } */ > > + __asm__ ("" : "={0}" (x) : "0" (x), "r" (y)); /* { dg-error "multiple > > inputs to hard register" } */ > > + __asm__ ("" : "+r" (y) : "{0}" (x)); /* { dg-error "multiple inputs to > > hard register" } */ > > + __asm__ ("" : "=r" (y) : "0" (y), "{0}" (x)); /* { dg-error "multiple > > inputs to hard register" } */ > > + > > + return x + y; > > +} > > diff --git a/gcc/testsuite/gcc.dg/asm-hard-reg-demotion.h > > b/gcc/testsuite/gcc.dg/asm-hard-reg-demotion.h > > new file mode 100644 > > index 00000000000..6d72f622ce9 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/asm-hard-reg-demotion.h > > @@ -0,0 +1,52 @@ > > +/* Pass parameter x in the first general argument register to the assembler > > + instruction. > > + > > + In function bar we fail to do so because after the function call to foo, > > + variable argreg1 does not contain the value of x but rather 42 which got > > + passed to foo. Thus, the function always returns 42. In contrast in > > + function baz, variable x is saved over the function call and > > materializes in > > + the asm statement and therefore is returned. */ > > + > > +#if defined (__aarch64__) > > +# define REG register int argreg1 __asm__ ("x0") = x; > > +# define MOVE1 __asm__ ("mov\t%0,%1" : "=r" (out) : "r" (argreg1)); > > +# define MOVE2 __asm__ ("mov\t%0,%1" : "=r" (out) : "{x0}" (x)); > > +#elif defined (__powerpc__) || defined (__POWERPC__) > > +# define REG register int argreg1 __asm__ ("r3") = x; > > +# define MOVE1 __asm__ ("mr\t%0,%1" : "=r" (out) : "r" (argreg1)); > > +# define MOVE2 __asm__ ("mr\t%0,%1" : "=r" (out) : "{r3}" (x)); > > +#elif defined (__riscv) > > +# define REG register int argreg1 __asm__ ("a0") = x; > > +# define MOVE1 __asm__ ("mv\t%0,%1" : "=r" (out) : "r" (argreg1)); > > +# define MOVE2 __asm__ ("mv\t%0,%1" : "=r" (out) : "{a0}" (x)); > > +#elif defined (__s390__) > > +# define REG register int argreg1 __asm__ ("r2") = x; > > +# define MOVE1 __asm__ ("lr\t%0,%1" : "=r" (out) : "r" (argreg1)); > > +# define MOVE2 __asm__ ("lr\t%0,%1" : "=r" (out) : "{r2}" (x)); > > +#elif defined (__x86_64__) > > +# define REG register int argreg1 __asm__ ("edi") = x; > > +# define MOVE1 __asm__ ("mov\t%1,%0" : "=r" (out) : "r" (argreg1)); > > +# define MOVE2 __asm__ ("mov\t%1,%0" : "=r" (out) : "{edi}" (x)); > > +#endif > > + > > +__attribute__ ((noipa)) > > +int foo (int unused) { } > > + > > +int > > +bar (int x) > > +{ > > + int out; > > + REG > > + foo (42); > > + MOVE1 > > + return out; > > +} > > + > > +int > > +baz (int x) > > +{ > > + int out; > > + foo (42); > > + MOVE2 > > + return out; > > +} >