On Wed, May 20, 2026 at 07:43:37AM +0200, Richard Biener wrote: > On Tue, May 19, 2026 at 2:26 PM Stefan Schulze Frielinghaus > <[email protected]> wrote: > > > > On Tue, May 19, 2026 at 08:12:35AM +0200, Richard Biener wrote: > > > On Mon, May 18, 2026 at 4:33 PM Stefan Schulze Frielinghaus > > > <[email protected]> wrote: > > > > > > > > From: Stefan Schulze Frielinghaus <[email protected]> > > > > > > > > Currently local register asm assignments materialize during expand into > > > > assignments utilizing hard registers. Since hard registers or more > > > > precisely objects residing in hard registers are not tracked > > > > individually, those are subject to be clobbered. Well known and > > > > documented are function calls which may clobber hard registers used for > > > > register asm objects. For example, compiling on aarch64 > > > > > > > > register int x asm ("x0") = 0x123; > > > > register int y asm ("x1") = *ptr; > > > > > > > > using address sanitizers results in > > > > > > > > x0:SI=0x123 > > > > x0:DI=r104:DI > > > > call [`__asan_load4'] argc:0 > > > > x1:SI=[r104:DI] > > > > > > > > The implicit function call added by the address sanitizer clobbers > > > > argument register x0 which was previously set for the register asm > > > > object. > > > > > > > > With the advent of hard register constraints, this can be overcome. > > > > Instead of expanding a register asm assignment directly into a hard > > > > register assignment, keep the register asm object in a pseudo for as > > > > long as possible and use a hard register constraint in Extended Asm > > > > statements which ensures that the object is finally allocated the > > > > respective hard register. Since local register asm is supposed to have > > > > an effect only for Extended Asm statements, this coincides with hard > > > > register constraints which materialize for the respective insn. > > > > > > > > This patch adds the feature of rewriting local register asm into code > > > > which exploits hard register constraints. For example > > > > > > > > register int global asm ("r3"); > > > > > > > > int foo (int x0) > > > > { > > > > register int x asm ("r4") = x0; > > > > register int y asm ("r5"); > > > > > > > > asm ("bar\t%0,%1,%2" : "=r" (x) : "0" (x), "r" (global)); > > > > x += 42; > > > > asm ("baz\t%0,%1" : "=r" (y) : "r" (x)); > > > > > > > > return y; > > > > } > > > > > > > > is rewritten during gimplification into > > > > > > > > register int global asm ("r3"); > > > > > > > > int foo (int x0) > > > > { > > > > register int tmp asm ("r5"); > > > > int x = x0; > > > > int y = tmp; > > > > > > > > asm ("bar\t%0,%1,%2" : "={r4}" (x) : "0" (x), "r" (global)); > > > > x += 42; > > > > asm ("baz\t%0,%1" : "={r5}" (y) : "{r4}" (x)); > > > > > > > > return y; > > > > } > > > > > > > > Note, uninitialized register asm objects may be used as inputs. Thus, > > > > if naively translated into hard register constraints, this would > > > > introduce reads from uninitialized objects which init-regs pass would > > > > fix up which in turn would mean that once hard register constraints > > > > materialize, respective registers would be zeroed (see comment in > > > > gimplify.cc for more details). This is solved by initializing every > > > > uninitialized register asm object by a fresh register asm object > > > > ensuring that it contains the respective register value. Subsequent > > > > passes remove dead stores in case those objects are eventually > > > > initialized at later points or are used exclusively as output operands. > > > > Therefore, in most cases, those temporary register asm objects won't > > > > materialize. This is not pretty at all but required in order to compile > > > > real world applications as e.g. glibc for target powerpc64le. > > > > > > > > Hard register constraints are more strict in order to prevent subtle > > > > bugs. This in turn means that certain programs are not valid after > > > > register asm demotion. For example, > > > > > > > > register int x asm ("r5") = 42; > > > > asm ("" : "+r" (x) : "r" (x)); > > > > > > > > is rewritten into > > > > > > > > int x = 42; > > > > asm ("" : "+{r5}" (x) : "{r5}" (x)); > > > > > > > > Now, two inputs refer to the very same register which is invalid. This > > > > example could have been massaged to make it fit, however, there are > > > > other examples which cannot. Currently, I lean towards rejecting those > > > > instead of fixing up, since those look like subtle bugs. > > > > > > Hmm, but local hardregs are supposed to be only used by extended > > > asm as a way to constrain inputs. With hardreg constraints they should > > > no longer be necessary. So - shouldn't we take the more aggressive > > > approach and diagnose them as being deprecated and point to > > > hardreg constraints? Can we even rewrite uses to hardreg constraints > > > (by rewriting the asm regs into SSA which, I think, we currently avoid)? > > > > The whole point of this patch is to automatically rewrite register asm > > objects into ordinary objects utilizing hard register constraints so > > that old code which is still depending on register asm profits from hard > > register constraints. > > Oops, I didn't look at the patch and infered a wrong idea about what it > does from the description. It seems to be exactly doing what I was > suggesting. > > > However, for hard register constraints I have > > been way more strict or in other words with register asm you have more > > freedom. Therefore, automatically translating those into hard register > > constraints will certainly fail here and there. You could argue that > > the example from above could be massaged to make it fit with hard > > register constraints. However, I was questioning whether it is > > worthwhile to implement this kind of logic since with register asm it is > > easy to come up with code which cannot be massaged easily or even at all > > since the semantics is up to my knowledge not clearly defined. For > > example > > > > register int x asm ("r5") = 42; > > register int y asm ("r5") = 24; > > asm ("" : "=r" (x) : "r" (x), "r" (y)); > > Uh ... > > > This kind of code is accepted by gcc/clang at the moment. My gut > > feeling is that this shouldn't have been accepted and is rather a side > > effect of the implementation but I might be completely wrong here. That > > being said with this patch, if flag -fdemote-register-asm is > > specified, then the code is rewritten into > > > > x = 42; > > y = 24; > > asm ("" : "={r5}" x : "{r5}" x, "{r5}" y); > > > > for which we error out since we have two inputs bound to the very same > > register. Since all those examples look like subtle bugs to me, I think > > it is better to diagnose those what the current implementation does > > (although I think the error message could be improved here and there). > > Maybe we should, for extra clarity, name -fdemote-register-asm as > -fstrict-register-asm and document it to be eventually the default.
I very much like this idea since it gives the user some sort of intuition ... especially when things break now but not before. > Could > we run the demotion analysis-only by default and diagnose cases like the > above? Or is there no way to achieve this? In case of -fno-strict-register-asm I still could diagnose cases like above which could help spotting potential problems upfront. Actually a couple of months ago I was thinking of giving a distro build a chance in order to get a better picture how register asm is used in real world applications. Having a diagnostics-only solution could help with this. I will come up with something and send a new revision. Cheers, Stefan > > Thanks, > Richard. > > > Cheers, > > Stefan > > > > > > > > Richard. > > > > > > > > > > > Since I consider this as an experimental feature it is hidden behind new > > > > flag -fdemote-register-asm. > > > > --- > > > > > > > > Notes: > > > > Patch v2 vs this one > > > > -------------------- > > > > > > > > Patch v2 > > > > > > > > https://inbox.sourceware.org/gcc-patches/[email protected]/ > > > > was the last one I posted. > > > > > > > > This patch keeps the behaviour if a register of a register asm > > > > object is > > > > not entailed in the register class of a corresponding constraint. > > > > Although, I would have rather liked to throw an error since this > > > > looks > > > > like a subtle bug to me, I kept this behaviour for the sake of > > > > compatibility. > > > > > > > > Furthermore, this patch also deals with uninitialized reads of > > > > register > > > > asm objects. Again, this rather sounds like a bug to me but for the > > > > sake of compatibility I kept this behaviour. > > > > > > > > Testing > > > > ------- > > > > > > > > Bootstrapped on > > > > - aarch64-unknown-linux-gnu > > > > - powerpc64le-unknown-linux-gnu > > > > - s390x-ibm-linux-gnu > > > > - x86_64-pc-linux-gnu > > > > > > > > Build and regtested glibc on > > > > - powerpc64le-unknown-linux-gnu > > > > - s390x-ibm-linux-gnu > > > > - x86_64-pc-linux-gnu > > > > > > > > Build Linux on > > > > - aarch64-unknown-linux-gnu > > > > - s390x-ibm-linux-gnu > > > > - x86_64-pc-linux-gnu (*) > > > > > > > > (*) For x86_64 the Linux kernel exposes a usage of register asm > > > > which > > > > leads to an error. The usage pattern is similar to the one > > > > described > > > > in the commit message > > > > > > > > register int x asm ("r5") = 42; > > > > asm ("" : "+r" (x) : "r" (x)); > > > > > > > > For testing purposes I replaced the in-out operand by an out > > > > operand and > > > > the Linux kernel compiles fine on x86_64. See for more details: > > > > https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662725.html > > > > > > > > Ok for mainline? > > > > > > > > gcc/common.opt | 4 + > > > > gcc/gimplify.cc | 186 ++++++++++++++++++ > > > > .../gcc.dg/asm-hard-reg-demotion-1.c | 74 +++++++ > > > > 3 files changed, 264 insertions(+) > > > > create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-demotion-1.c > > > > > > > > diff --git a/gcc/common.opt b/gcc/common.opt > > > > index 3ad1444cc88..f0bc8492190 100644 > > > > --- a/gcc/common.opt > > > > +++ b/gcc/common.opt > > > > @@ -3588,6 +3588,10 @@ fverbose-asm > > > > Common Var(flag_verbose_asm) > > > > Add extra commentary to assembler output. > > > > > > > > +fdemote-register-asm > > > > +Common Var(flag_demote_register_asm) Init(0) > > > > +Demote local register asm and use hard register constraints instead. > > > > + > > > > fvisibility= > > > > Common Joined RejectNegative Enum(symbol_visibility) > > > > Var(default_visibility) Init(VISIBILITY_DEFAULT) > > > > -fvisibility=[default|internal|hidden|protected] Set the default > > > > symbol visibility. > > > > diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc > > > > index e4db4b1d9bd..c6560c14f48 100644 > > > > --- a/gcc/gimplify.cc > > > > +++ b/gcc/gimplify.cc > > > > @@ -2246,6 +2246,40 @@ gimplify_decl_expr (tree *stmt_p, gimple_seq > > > > *seq_p) > > > > && clear_padding_type_may_have_padding_p (TREE_TYPE > > > > (decl))) > > > > gimple_add_padding_init_for_auto_var (decl, is_vla, seq_p); > > > > } > > > > + else if (flag_demote_register_asm && !init && DECL_HARD_REGISTER > > > > (decl)) > > > > + { > > > > + /* Register asm objects may be used uninitialized as inputs. > > > > + Therefore, a naive translation into hard register > > > > constraints > > > > + would render the demoted objects to be default initialized > > > > by > > > > + init-regs and as a consequence, once hard register > > > > constraints > > > > + materialize for an Extended Asm, hard registers would be > > > > zeroed > > > > + which didn't happen before. Therefore, to overcome this, > > > > + initialize each demoted object with the current contents > > > > of the > > > > + hard register it previously referred to. For example, > > > > translate > > > > + > > > > + register int x asm ("r5"); > > > > + int y; > > > > + asm ("..." : "=r" (y) : "r" (x)); > > > > + > > > > + to > > > > + > > > > + register int tmp asm ("r5"); > > > > + int x = tmp; > > > > + int y; > > > > + asm ("..." : "=r" (y) : "{r5}" (x)); > > > > + > > > > + Do this unconditionally for all uninitialized register asm > > > > objects > > > > + and let subsequent passes remove dead stores in case those > > > > objects > > > > + are initialized at later points or are used exclusively as > > > > output > > > > + operands. */ > > > > + tree tmp = create_tmp_var (TREE_TYPE (decl), "regasm"); > > > > + SET_DECL_ASSEMBLER_NAME (tmp, DECL_ASSEMBLER_NAME (decl)); > > > > + DECL_REGISTER (tmp) = 1; > > > > + DECL_HARD_REGISTER (tmp) = 1; > > > > + DECL_INITIAL (decl) = tmp; > > > > + *stmt_p = stmt; > > > > + return gimplify_decl_expr (stmt_p, seq_p); > > > > + } > > > > } > > > > > > > > return GS_ALL_DONE; > > > > @@ -7976,6 +8010,137 @@ num_alternatives (const_tree link) > > > > return num + 1; > > > > } > > > > > > > > +static inline bool > > > > +rclass_entails_registers (enum reg_class rclass, int regno, int nregs) > > > > +{ > > > > + for (int i = regno; i < regno + nregs ; ++i) > > > > + if (!TEST_HARD_REG_BIT (reg_class_contents[rclass], i)) > > > > + return false; > > > > + return true; > > > > +} > > > > + > > > > +/* Keep track of all register asm which have been replaced by hard > > > > register > > > > + constraints. After all asm statements of a function have been > > > > processed, > > > > + demote those to ordinary objects. */ > > > > +static hash_set<tree> demote_register_asm; > > > > + > > > > +/* Rewrite constraints of Extended Asm operands which refer to local > > > > register > > > > + asm objects into hard register constraints. Also mark those > > > > objects to be > > > > + demoted from register asm objects to ordinary objects which is done > > > > + basically after gimplification of the function body. > > > > + > > > > + For example, the following translation unit > > > > + > > > > + register int global asm ("r3"); > > > > + > > > > + int foo (int x0) > > > > + { > > > > + register int x asm ("r4") = x0; > > > > + register int y asm ("r5"); > > > > + > > > > + asm ("..." : "=r" (x) : "0" (x), "r" (global)); > > > > + x += 42; > > > > + asm ("..." : "=r" (y) : "r" (x)); > > > > + > > > > + return y; > > > > + } > > > > + > > > > + is rewritten into > > > > + > > > > + register int global asm ("r3"); > > > > + > > > > + int foo (int x0) > > > > + { > > > > + register int tmp asm ("r5"); > > > > + int x = x0; > > > > + int y = tmp; > > > > + > > > > + asm ("..." : "={r4}" (x) : "0" (x), "r" (global)); > > > > + x += 42; > > > > + asm ("..." : "={r5}" (y) : "{r4}" (x)); > > > > + > > > > + return y; > > > > + } > > > > + > > > > + Any local register asm which is not initialized at its declaration, > > > > is > > > > + implicitly initialized with the contents of the respective hard > > > > register. > > > > + See gimplify_decl_expr() for more details. Ideally we would error > > > > out in > > > > + those cases, however, for the sake of compatibility keep the current > > > > + behaviour. > > > > + > > > > + Note, only rewrite a constraint in case it entails the registers > > > > referred to > > > > + by the corresponding register asm object. For example, assume that > > > > register > > > > + f5 is a floating-point register and is therefore not included in the > > > > + register class associated by constraint r. > > > > + > > > > + register float x asm ("f5"); > > > > + asm ("..." : "=r" (x)); > > > > + > > > > + Then the constraint is not altered. However, the register asm > > > > object is > > > > + still demoted to an ordinary object which means we finally end up > > > > with > > > > + > > > > + float x; > > > > + asm ("..." : "=r" (x)); > > > > + > > > > + Ideally we would error out here since this rather looks like a bug, > > > > however, > > > > + for the sake of compatibility, preserve the current behaviour of > > > > register > > > > + asm. */ > > > > + > > > > +static void > > > > +gimplify_demote_register_asm (tree link) > > > > +{ > > > > + tree op = TREE_VALUE (link); > > > > + if (!VAR_P (op) || !DECL_HARD_REGISTER (op) || is_global_var (op)) > > > > + return; > > > > + tree id = DECL_ASSEMBLER_NAME (op); > > > > + const char *regname = IDENTIFIER_POINTER (id); > > > > + ++regname; > > > > + int regno = decode_reg_name (regname); > > > > + if (regno < 0) > > > > + /* This indicates an error and we error out later on. */ > > > > + return; > > > > + /* Currently, fixed registers cannot be used for hard register > > > > constraints > > > > + which is why we skip those for the moment. */ > > > > + if (fixed_regs[regno]) > > > > + return; > > > > + machine_mode mode = TYPE_MODE (TREE_TYPE (op)); > > > > + int nregs = hard_regno_nregs (regno, mode); > > > > + const char *constraint > > > > + = TREE_STRING_POINTER (TREE_VALUE (TREE_PURPOSE (link))); > > > > + auto_vec<char, 64> constraint_new; > > > > + for (const char *p = constraint; *p; ) > > > > + { > > > > + bool changed_p = false; > > > > + enum constraint_num cn = lookup_constraint (p); > > > > + enum reg_class rclass = reg_class_for_constraint (cn); > > > > + if (rclass != NO_REGS && rclass_entails_registers (rclass, > > > > regno, nregs)) > > > > + { > > > > + /* At this point we have a constraint which entails all the > > > > registers > > > > + required by the register asm operand. Therefore, rewrite > > > > the > > > > + constraint into a corresponding hard register constraint. > > > > */ > > > > + constraint_new.safe_push ('{'); > > > > + size_t len = strlen (regname); > > > > + for (size_t i = 0; i < len; ++i) > > > > + constraint_new.safe_push (regname[i]); > > > > + constraint_new.safe_push ('}'); > > > > + changed_p = true; > > > > + } > > > > + > > > > + for (size_t len = CONSTRAINT_LEN (*p, p); len; len--, p++) > > > > + { > > > > + if (!changed_p) > > > > + constraint_new.safe_push (*p); > > > > + if (*p == '\0') > > > > + break; > > > > + } > > > > + } > > > > + constraint_new.safe_push ('\0'); > > > > + unsigned int len = constraint_new.length (); > > > > + tree str = build_string (len, constraint_new.address ()); > > > > + TREE_VALUE (TREE_PURPOSE (link)) = str; > > > > + demote_register_asm.add (op); > > > > +} > > > > + > > > > /* Gimplify the operands of an ASM_EXPR. Input operands should be a > > > > gimple > > > > value; output operands should be a gimple lvalue. */ > > > > > > > > @@ -8372,6 +8537,20 @@ gimplify_asm_expr (tree *expr_p, gimple_seq > > > > *pre_p, gimple_seq *post_p) > > > > /* Do not add ASMs with errors to the gimple IL stream. */ > > > > if (ret != GS_ERROR) > > > > { > > > > + if (flag_demote_register_asm) > > > > + { > > > > + for (unsigned i = 0; i < vec_safe_length (outputs); ++i) > > > > + { > > > > + tree link = (*outputs)[i]; > > > > + gimplify_demote_register_asm (link); > > > > + } > > > > + for (unsigned i = 0; i < vec_safe_length (inputs); ++i) > > > > + { > > > > + tree link = (*inputs)[i]; > > > > + gimplify_demote_register_asm (link); > > > > + } > > > > + } > > > > + > > > > stmt = gimple_build_asm_vec (TREE_STRING_POINTER (ASM_STRING > > > > (expr)), > > > > inputs, outputs, clobbers, labels); > > > > > > > > @@ -21874,6 +22053,13 @@ gimplify_body (tree fndecl, bool do_parms) > > > > } > > > > } > > > > > > > > + for (auto op : demote_register_asm) > > > > + { > > > > + DECL_REGISTER (op) = 0; > > > > + DECL_HARD_REGISTER (op) = 0; > > > > + } > > > > + demote_register_asm.empty (); > > > > + > > > > if ((flag_openacc || flag_openmp || flag_openmp_simd) > > > > && gimplify_omp_ctxp) > > > > { > > > > diff --git a/gcc/testsuite/gcc.dg/asm-hard-reg-demotion-1.c > > > > b/gcc/testsuite/gcc.dg/asm-hard-reg-demotion-1.c > > > > new file mode 100644 > > > > index 00000000000..851adb3af40 > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.dg/asm-hard-reg-demotion-1.c > > > > @@ -0,0 +1,74 @@ > > > > +/* { dg-do compile { target aarch64*-*-* s390*-*-* x86_64-*-* } } */ > > > > +/* { dg-additional-options "-fdemote-register-asm -fdump-tree-gimple" > > > > } */ > > > > +/* { dg-additional-options "-msse2" { target x86_64-*-* } } */ > > > > + > > > > +#if __aarch64__ > > > > +# define GPR "r5" > > > > +# define FPR "d5" > > > > +# define CSTR_GPR "r" > > > > +# define CSTR_FPR "w" > > > > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=\{r5\}\" > > > > x0\\);" 1 "gimple" { target aarch64-*-* } } } */ > > > > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=\{d5\}\" > > > > x1\\);" 1 "gimple" { target aarch64-*-* } } } */ > > > > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=w\" x2\\);" > > > > 1 "gimple" { target aarch64-*-* } } } */ > > > > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=r\" x3\\);" > > > > 1 "gimple" { target aarch64-*-* } } } */ > > > > +#elif __s390__ > > > > +# define GPR "r5" > > > > +# define FPR "f5" > > > > +# define CSTR_GPR "r" > > > > +# define CSTR_FPR "f" > > > > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=\{r5\}\" > > > > x0\\);" 1 "gimple" { target s390*-*-* } } } */ > > > > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=\{f5\}\" > > > > x1\\);" 1 "gimple" { target s390*-*-* } } } */ > > > > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=f\" x2\\);" > > > > 1 "gimple" { target s390*-*-* } } } */ > > > > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=r\" x3\\);" > > > > 1 "gimple" { target s390*-*-* } } } */ > > > > +#elif __x86_64__ > > > > +# define GPR "cx" > > > > +# define FPR "xmm5" > > > > +# define CSTR_GPR "r" > > > > +# define CSTR_FPR "x" > > > > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=\{cx\}\" > > > > x0\\);" 1 "gimple" { target x86_64-*-* } } } */ > > > > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=\{xmm5\}\" > > > > x1\\);" 1 "gimple" { target x86_64-*-* } } } */ > > > > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=x\" x2\\);" > > > > 1 "gimple" { target x86_64-*-* } } } */ > > > > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=r\" x3\\);" > > > > 1 "gimple" { target x86_64-*-* } } } */ > > > > +#else > > > > +# error unsupported target > > > > +#endif > > > > + > > > > +/* Rewrite constraints into hard register constraints and demote > > > > register asm > > > > + objects into ordinary objects. */ > > > > + > > > > +int > > > > +test_gpr_constraint_gpr_register (void) > > > > +{ > > > > + register int x0 __asm__ (GPR); > > > > + __asm__ ("" : "="CSTR_GPR (x0)); > > > > + return x0; > > > > +} > > > > + > > > > +float > > > > +test_fpr_constraint_fpr_register (void) > > > > +{ > > > > + register float x1 __asm__ (FPR); > > > > + __asm__ ("" : "="CSTR_FPR (x1)); > > > > + return x1; > > > > +} > > > > + > > > > +/* The following two tests are unusual in the sense that the register > > > > is not > > > > + subsumed by the constraint. Keep the current behaviour by not > > > > changing the > > > > + constraints and only demote the register asm objects into ordinary > > > > objects. > > > > + Erroring out would be probably better since this could be a subtle > > > > bug. */ > > > > + > > > > +int > > > > +test_fpr_constraint_gpr_register (void) > > > > +{ > > > > + register int x2 __asm__ (GPR); > > > > + __asm__ ("" : "="CSTR_FPR (x2)); > > > > + return x2; > > > > +} > > > > + > > > > +float > > > > +test_gpr_constraint_fpr_register (void) > > > > +{ > > > > + register float x3 __asm__ (FPR); > > > > + __asm__ ("" : "="CSTR_GPR (x3)); > > > > + return x3; > > > > +} > > > > -- > > > > 2.53.0 > > > >
