On Tue, May 19, 2026 at 2:26 PM Stefan Schulze Frielinghaus
<[email protected]> wrote:
>
> On Tue, May 19, 2026 at 08:12:35AM +0200, Richard Biener wrote:
> > On Mon, May 18, 2026 at 4:33 PM Stefan Schulze Frielinghaus
> > <[email protected]> wrote:
> > >
> > > From: Stefan Schulze Frielinghaus <[email protected]>
> > >
> > > Currently local register asm assignments materialize during expand into
> > > assignments utilizing hard registers. Since hard registers or more
> > > precisely objects residing in hard registers are not tracked
> > > individually, those are subject to be clobbered. Well known and
> > > documented are function calls which may clobber hard registers used for
> > > register asm objects. For example, compiling on aarch64
> > >
> > > register int x asm ("x0") = 0x123;
> > > register int y asm ("x1") = *ptr;
> > >
> > > using address sanitizers results in
> > >
> > > x0:SI=0x123
> > > x0:DI=r104:DI
> > > call [`__asan_load4'] argc:0
> > > x1:SI=[r104:DI]
> > >
> > > The implicit function call added by the address sanitizer clobbers
> > > argument register x0 which was previously set for the register asm
> > > object.
> > >
> > > With the advent of hard register constraints, this can be overcome.
> > > Instead of expanding a register asm assignment directly into a hard
> > > register assignment, keep the register asm object in a pseudo for as
> > > long as possible and use a hard register constraint in Extended Asm
> > > statements which ensures that the object is finally allocated the
> > > respective hard register. Since local register asm is supposed to have
> > > an effect only for Extended Asm statements, this coincides with hard
> > > register constraints which materialize for the respective insn.
> > >
> > > This patch adds the feature of rewriting local register asm into code
> > > which exploits hard register constraints. For example
> > >
> > > register int global asm ("r3");
> > >
> > > int foo (int x0)
> > > {
> > > register int x asm ("r4") = x0;
> > > register int y asm ("r5");
> > >
> > > asm ("bar\t%0,%1,%2" : "=r" (x) : "0" (x), "r" (global));
> > > x += 42;
> > > asm ("baz\t%0,%1" : "=r" (y) : "r" (x));
> > >
> > > return y;
> > > }
> > >
> > > is rewritten during gimplification into
> > >
> > > register int global asm ("r3");
> > >
> > > int foo (int x0)
> > > {
> > > register int tmp asm ("r5");
> > > int x = x0;
> > > int y = tmp;
> > >
> > > asm ("bar\t%0,%1,%2" : "={r4}" (x) : "0" (x), "r" (global));
> > > x += 42;
> > > asm ("baz\t%0,%1" : "={r5}" (y) : "{r4}" (x));
> > >
> > > return y;
> > > }
> > >
> > > Note, uninitialized register asm objects may be used as inputs. Thus,
> > > if naively translated into hard register constraints, this would
> > > introduce reads from uninitialized objects which init-regs pass would
> > > fix up which in turn would mean that once hard register constraints
> > > materialize, respective registers would be zeroed (see comment in
> > > gimplify.cc for more details). This is solved by initializing every
> > > uninitialized register asm object by a fresh register asm object
> > > ensuring that it contains the respective register value. Subsequent
> > > passes remove dead stores in case those objects are eventually
> > > initialized at later points or are used exclusively as output operands.
> > > Therefore, in most cases, those temporary register asm objects won't
> > > materialize. This is not pretty at all but required in order to compile
> > > real world applications as e.g. glibc for target powerpc64le.
> > >
> > > Hard register constraints are more strict in order to prevent subtle
> > > bugs. This in turn means that certain programs are not valid after
> > > register asm demotion. For example,
> > >
> > > register int x asm ("r5") = 42;
> > > asm ("" : "+r" (x) : "r" (x));
> > >
> > > is rewritten into
> > >
> > > int x = 42;
> > > asm ("" : "+{r5}" (x) : "{r5}" (x));
> > >
> > > Now, two inputs refer to the very same register which is invalid. This
> > > example could have been massaged to make it fit, however, there are
> > > other examples which cannot. Currently, I lean towards rejecting those
> > > instead of fixing up, since those look like subtle bugs.
> >
> > Hmm, but local hardregs are supposed to be only used by extended
> > asm as a way to constrain inputs. With hardreg constraints they should
> > no longer be necessary. So - shouldn't we take the more aggressive
> > approach and diagnose them as being deprecated and point to
> > hardreg constraints? Can we even rewrite uses to hardreg constraints
> > (by rewriting the asm regs into SSA which, I think, we currently avoid)?
>
> The whole point of this patch is to automatically rewrite register asm
> objects into ordinary objects utilizing hard register constraints so
> that old code which is still depending on register asm profits from hard
> register constraints.
Oops, I didn't look at the patch and infered a wrong idea about what it
does from the description. It seems to be exactly doing what I was
suggesting.
> However, for hard register constraints I have
> been way more strict or in other words with register asm you have more
> freedom. Therefore, automatically translating those into hard register
> constraints will certainly fail here and there. You could argue that
> the example from above could be massaged to make it fit with hard
> register constraints. However, I was questioning whether it is
> worthwhile to implement this kind of logic since with register asm it is
> easy to come up with code which cannot be massaged easily or even at all
> since the semantics is up to my knowledge not clearly defined. For
> example
>
> register int x asm ("r5") = 42;
> register int y asm ("r5") = 24;
> asm ("" : "=r" (x) : "r" (x), "r" (y));
Uh ...
> This kind of code is accepted by gcc/clang at the moment. My gut
> feeling is that this shouldn't have been accepted and is rather a side
> effect of the implementation but I might be completely wrong here. That
> being said with this patch, if flag -fdemote-register-asm is
> specified, then the code is rewritten into
>
> x = 42;
> y = 24;
> asm ("" : "={r5}" x : "{r5}" x, "{r5}" y);
>
> for which we error out since we have two inputs bound to the very same
> register. Since all those examples look like subtle bugs to me, I think
> it is better to diagnose those what the current implementation does
> (although I think the error message could be improved here and there).
Maybe we should, for extra clarity, name -fdemote-register-asm as
-fstrict-register-asm and document it to be eventually the default. Could
we run the demotion analysis-only by default and diagnose cases like the
above? Or is there no way to achieve this?
Thanks,
Richard.
> Cheers,
> Stefan
>
> >
> > Richard.
> >
> > >
> > > Since I consider this as an experimental feature it is hidden behind new
> > > flag -fdemote-register-asm.
> > > ---
> > >
> > > Notes:
> > > Patch v2 vs this one
> > > --------------------
> > >
> > > Patch v2
> > >
> > > https://inbox.sourceware.org/gcc-patches/[email protected]/
> > > was the last one I posted.
> > >
> > > This patch keeps the behaviour if a register of a register asm object
> > > is
> > > not entailed in the register class of a corresponding constraint.
> > > Although, I would have rather liked to throw an error since this looks
> > > like a subtle bug to me, I kept this behaviour for the sake of
> > > compatibility.
> > >
> > > Furthermore, this patch also deals with uninitialized reads of
> > > register
> > > asm objects. Again, this rather sounds like a bug to me but for the
> > > sake of compatibility I kept this behaviour.
> > >
> > > Testing
> > > -------
> > >
> > > Bootstrapped on
> > > - aarch64-unknown-linux-gnu
> > > - powerpc64le-unknown-linux-gnu
> > > - s390x-ibm-linux-gnu
> > > - x86_64-pc-linux-gnu
> > >
> > > Build and regtested glibc on
> > > - powerpc64le-unknown-linux-gnu
> > > - s390x-ibm-linux-gnu
> > > - x86_64-pc-linux-gnu
> > >
> > > Build Linux on
> > > - aarch64-unknown-linux-gnu
> > > - s390x-ibm-linux-gnu
> > > - x86_64-pc-linux-gnu (*)
> > >
> > > (*) For x86_64 the Linux kernel exposes a usage of register asm which
> > > leads to an error. The usage pattern is similar to the one described
> > > in the commit message
> > >
> > > register int x asm ("r5") = 42;
> > > asm ("" : "+r" (x) : "r" (x));
> > >
> > > For testing purposes I replaced the in-out operand by an out operand
> > > and
> > > the Linux kernel compiles fine on x86_64. See for more details:
> > > https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662725.html
> > >
> > > Ok for mainline?
> > >
> > > gcc/common.opt | 4 +
> > > gcc/gimplify.cc | 186 ++++++++++++++++++
> > > .../gcc.dg/asm-hard-reg-demotion-1.c | 74 +++++++
> > > 3 files changed, 264 insertions(+)
> > > create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-demotion-1.c
> > >
> > > diff --git a/gcc/common.opt b/gcc/common.opt
> > > index 3ad1444cc88..f0bc8492190 100644
> > > --- a/gcc/common.opt
> > > +++ b/gcc/common.opt
> > > @@ -3588,6 +3588,10 @@ fverbose-asm
> > > Common Var(flag_verbose_asm)
> > > Add extra commentary to assembler output.
> > >
> > > +fdemote-register-asm
> > > +Common Var(flag_demote_register_asm) Init(0)
> > > +Demote local register asm and use hard register constraints instead.
> > > +
> > > fvisibility=
> > > Common Joined RejectNegative Enum(symbol_visibility)
> > > Var(default_visibility) Init(VISIBILITY_DEFAULT)
> > > -fvisibility=[default|internal|hidden|protected] Set the default
> > > symbol visibility.
> > > diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
> > > index e4db4b1d9bd..c6560c14f48 100644
> > > --- a/gcc/gimplify.cc
> > > +++ b/gcc/gimplify.cc
> > > @@ -2246,6 +2246,40 @@ gimplify_decl_expr (tree *stmt_p, gimple_seq
> > > *seq_p)
> > > && clear_padding_type_may_have_padding_p (TREE_TYPE (decl)))
> > > gimple_add_padding_init_for_auto_var (decl, is_vla, seq_p);
> > > }
> > > + else if (flag_demote_register_asm && !init && DECL_HARD_REGISTER
> > > (decl))
> > > + {
> > > + /* Register asm objects may be used uninitialized as inputs.
> > > + Therefore, a naive translation into hard register constraints
> > > + would render the demoted objects to be default initialized by
> > > + init-regs and as a consequence, once hard register
> > > constraints
> > > + materialize for an Extended Asm, hard registers would be
> > > zeroed
> > > + which didn't happen before. Therefore, to overcome this,
> > > + initialize each demoted object with the current contents of
> > > the
> > > + hard register it previously referred to. For example,
> > > translate
> > > +
> > > + register int x asm ("r5");
> > > + int y;
> > > + asm ("..." : "=r" (y) : "r" (x));
> > > +
> > > + to
> > > +
> > > + register int tmp asm ("r5");
> > > + int x = tmp;
> > > + int y;
> > > + asm ("..." : "=r" (y) : "{r5}" (x));
> > > +
> > > + Do this unconditionally for all uninitialized register asm
> > > objects
> > > + and let subsequent passes remove dead stores in case those
> > > objects
> > > + are initialized at later points or are used exclusively as
> > > output
> > > + operands. */
> > > + tree tmp = create_tmp_var (TREE_TYPE (decl), "regasm");
> > > + SET_DECL_ASSEMBLER_NAME (tmp, DECL_ASSEMBLER_NAME (decl));
> > > + DECL_REGISTER (tmp) = 1;
> > > + DECL_HARD_REGISTER (tmp) = 1;
> > > + DECL_INITIAL (decl) = tmp;
> > > + *stmt_p = stmt;
> > > + return gimplify_decl_expr (stmt_p, seq_p);
> > > + }
> > > }
> > >
> > > return GS_ALL_DONE;
> > > @@ -7976,6 +8010,137 @@ num_alternatives (const_tree link)
> > > return num + 1;
> > > }
> > >
> > > +static inline bool
> > > +rclass_entails_registers (enum reg_class rclass, int regno, int nregs)
> > > +{
> > > + for (int i = regno; i < regno + nregs ; ++i)
> > > + if (!TEST_HARD_REG_BIT (reg_class_contents[rclass], i))
> > > + return false;
> > > + return true;
> > > +}
> > > +
> > > +/* Keep track of all register asm which have been replaced by hard
> > > register
> > > + constraints. After all asm statements of a function have been
> > > processed,
> > > + demote those to ordinary objects. */
> > > +static hash_set<tree> demote_register_asm;
> > > +
> > > +/* Rewrite constraints of Extended Asm operands which refer to local
> > > register
> > > + asm objects into hard register constraints. Also mark those objects
> > > to be
> > > + demoted from register asm objects to ordinary objects which is done
> > > + basically after gimplification of the function body.
> > > +
> > > + For example, the following translation unit
> > > +
> > > + register int global asm ("r3");
> > > +
> > > + int foo (int x0)
> > > + {
> > > + register int x asm ("r4") = x0;
> > > + register int y asm ("r5");
> > > +
> > > + asm ("..." : "=r" (x) : "0" (x), "r" (global));
> > > + x += 42;
> > > + asm ("..." : "=r" (y) : "r" (x));
> > > +
> > > + return y;
> > > + }
> > > +
> > > + is rewritten into
> > > +
> > > + register int global asm ("r3");
> > > +
> > > + int foo (int x0)
> > > + {
> > > + register int tmp asm ("r5");
> > > + int x = x0;
> > > + int y = tmp;
> > > +
> > > + asm ("..." : "={r4}" (x) : "0" (x), "r" (global));
> > > + x += 42;
> > > + asm ("..." : "={r5}" (y) : "{r4}" (x));
> > > +
> > > + return y;
> > > + }
> > > +
> > > + Any local register asm which is not initialized at its declaration, is
> > > + implicitly initialized with the contents of the respective hard
> > > register.
> > > + See gimplify_decl_expr() for more details. Ideally we would error
> > > out in
> > > + those cases, however, for the sake of compatibility keep the current
> > > + behaviour.
> > > +
> > > + Note, only rewrite a constraint in case it entails the registers
> > > referred to
> > > + by the corresponding register asm object. For example, assume that
> > > register
> > > + f5 is a floating-point register and is therefore not included in the
> > > + register class associated by constraint r.
> > > +
> > > + register float x asm ("f5");
> > > + asm ("..." : "=r" (x));
> > > +
> > > + Then the constraint is not altered. However, the register asm object
> > > is
> > > + still demoted to an ordinary object which means we finally end up with
> > > +
> > > + float x;
> > > + asm ("..." : "=r" (x));
> > > +
> > > + Ideally we would error out here since this rather looks like a bug,
> > > however,
> > > + for the sake of compatibility, preserve the current behaviour of
> > > register
> > > + asm. */
> > > +
> > > +static void
> > > +gimplify_demote_register_asm (tree link)
> > > +{
> > > + tree op = TREE_VALUE (link);
> > > + if (!VAR_P (op) || !DECL_HARD_REGISTER (op) || is_global_var (op))
> > > + return;
> > > + tree id = DECL_ASSEMBLER_NAME (op);
> > > + const char *regname = IDENTIFIER_POINTER (id);
> > > + ++regname;
> > > + int regno = decode_reg_name (regname);
> > > + if (regno < 0)
> > > + /* This indicates an error and we error out later on. */
> > > + return;
> > > + /* Currently, fixed registers cannot be used for hard register
> > > constraints
> > > + which is why we skip those for the moment. */
> > > + if (fixed_regs[regno])
> > > + return;
> > > + machine_mode mode = TYPE_MODE (TREE_TYPE (op));
> > > + int nregs = hard_regno_nregs (regno, mode);
> > > + const char *constraint
> > > + = TREE_STRING_POINTER (TREE_VALUE (TREE_PURPOSE (link)));
> > > + auto_vec<char, 64> constraint_new;
> > > + for (const char *p = constraint; *p; )
> > > + {
> > > + bool changed_p = false;
> > > + enum constraint_num cn = lookup_constraint (p);
> > > + enum reg_class rclass = reg_class_for_constraint (cn);
> > > + if (rclass != NO_REGS && rclass_entails_registers (rclass, regno,
> > > nregs))
> > > + {
> > > + /* At this point we have a constraint which entails all the
> > > registers
> > > + required by the register asm operand. Therefore, rewrite the
> > > + constraint into a corresponding hard register constraint. */
> > > + constraint_new.safe_push ('{');
> > > + size_t len = strlen (regname);
> > > + for (size_t i = 0; i < len; ++i)
> > > + constraint_new.safe_push (regname[i]);
> > > + constraint_new.safe_push ('}');
> > > + changed_p = true;
> > > + }
> > > +
> > > + for (size_t len = CONSTRAINT_LEN (*p, p); len; len--, p++)
> > > + {
> > > + if (!changed_p)
> > > + constraint_new.safe_push (*p);
> > > + if (*p == '\0')
> > > + break;
> > > + }
> > > + }
> > > + constraint_new.safe_push ('\0');
> > > + unsigned int len = constraint_new.length ();
> > > + tree str = build_string (len, constraint_new.address ());
> > > + TREE_VALUE (TREE_PURPOSE (link)) = str;
> > > + demote_register_asm.add (op);
> > > +}
> > > +
> > > /* Gimplify the operands of an ASM_EXPR. Input operands should be a
> > > gimple
> > > value; output operands should be a gimple lvalue. */
> > >
> > > @@ -8372,6 +8537,20 @@ gimplify_asm_expr (tree *expr_p, gimple_seq
> > > *pre_p, gimple_seq *post_p)
> > > /* Do not add ASMs with errors to the gimple IL stream. */
> > > if (ret != GS_ERROR)
> > > {
> > > + if (flag_demote_register_asm)
> > > + {
> > > + for (unsigned i = 0; i < vec_safe_length (outputs); ++i)
> > > + {
> > > + tree link = (*outputs)[i];
> > > + gimplify_demote_register_asm (link);
> > > + }
> > > + for (unsigned i = 0; i < vec_safe_length (inputs); ++i)
> > > + {
> > > + tree link = (*inputs)[i];
> > > + gimplify_demote_register_asm (link);
> > > + }
> > > + }
> > > +
> > > stmt = gimple_build_asm_vec (TREE_STRING_POINTER (ASM_STRING
> > > (expr)),
> > > inputs, outputs, clobbers, labels);
> > >
> > > @@ -21874,6 +22053,13 @@ gimplify_body (tree fndecl, bool do_parms)
> > > }
> > > }
> > >
> > > + for (auto op : demote_register_asm)
> > > + {
> > > + DECL_REGISTER (op) = 0;
> > > + DECL_HARD_REGISTER (op) = 0;
> > > + }
> > > + demote_register_asm.empty ();
> > > +
> > > if ((flag_openacc || flag_openmp || flag_openmp_simd)
> > > && gimplify_omp_ctxp)
> > > {
> > > diff --git a/gcc/testsuite/gcc.dg/asm-hard-reg-demotion-1.c
> > > b/gcc/testsuite/gcc.dg/asm-hard-reg-demotion-1.c
> > > new file mode 100644
> > > index 00000000000..851adb3af40
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/asm-hard-reg-demotion-1.c
> > > @@ -0,0 +1,74 @@
> > > +/* { dg-do compile { target aarch64*-*-* s390*-*-* x86_64-*-* } } */
> > > +/* { dg-additional-options "-fdemote-register-asm -fdump-tree-gimple" }
> > > */
> > > +/* { dg-additional-options "-msse2" { target x86_64-*-* } } */
> > > +
> > > +#if __aarch64__
> > > +# define GPR "r5"
> > > +# define FPR "d5"
> > > +# define CSTR_GPR "r"
> > > +# define CSTR_FPR "w"
> > > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=\{r5\}\"
> > > x0\\);" 1 "gimple" { target aarch64-*-* } } } */
> > > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=\{d5\}\"
> > > x1\\);" 1 "gimple" { target aarch64-*-* } } } */
> > > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=w\" x2\\);" 1
> > > "gimple" { target aarch64-*-* } } } */
> > > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=r\" x3\\);" 1
> > > "gimple" { target aarch64-*-* } } } */
> > > +#elif __s390__
> > > +# define GPR "r5"
> > > +# define FPR "f5"
> > > +# define CSTR_GPR "r"
> > > +# define CSTR_FPR "f"
> > > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=\{r5\}\"
> > > x0\\);" 1 "gimple" { target s390*-*-* } } } */
> > > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=\{f5\}\"
> > > x1\\);" 1 "gimple" { target s390*-*-* } } } */
> > > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=f\" x2\\);" 1
> > > "gimple" { target s390*-*-* } } } */
> > > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=r\" x3\\);" 1
> > > "gimple" { target s390*-*-* } } } */
> > > +#elif __x86_64__
> > > +# define GPR "cx"
> > > +# define FPR "xmm5"
> > > +# define CSTR_GPR "r"
> > > +# define CSTR_FPR "x"
> > > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=\{cx\}\"
> > > x0\\);" 1 "gimple" { target x86_64-*-* } } } */
> > > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=\{xmm5\}\"
> > > x1\\);" 1 "gimple" { target x86_64-*-* } } } */
> > > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=x\" x2\\);" 1
> > > "gimple" { target x86_64-*-* } } } */
> > > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=r\" x3\\);" 1
> > > "gimple" { target x86_64-*-* } } } */
> > > +#else
> > > +# error unsupported target
> > > +#endif
> > > +
> > > +/* Rewrite constraints into hard register constraints and demote
> > > register asm
> > > + objects into ordinary objects. */
> > > +
> > > +int
> > > +test_gpr_constraint_gpr_register (void)
> > > +{
> > > + register int x0 __asm__ (GPR);
> > > + __asm__ ("" : "="CSTR_GPR (x0));
> > > + return x0;
> > > +}
> > > +
> > > +float
> > > +test_fpr_constraint_fpr_register (void)
> > > +{
> > > + register float x1 __asm__ (FPR);
> > > + __asm__ ("" : "="CSTR_FPR (x1));
> > > + return x1;
> > > +}
> > > +
> > > +/* The following two tests are unusual in the sense that the register is
> > > not
> > > + subsumed by the constraint. Keep the current behaviour by not
> > > changing the
> > > + constraints and only demote the register asm objects into ordinary
> > > objects.
> > > + Erroring out would be probably better since this could be a subtle
> > > bug. */
> > > +
> > > +int
> > > +test_fpr_constraint_gpr_register (void)
> > > +{
> > > + register int x2 __asm__ (GPR);
> > > + __asm__ ("" : "="CSTR_FPR (x2));
> > > + return x2;
> > > +}
> > > +
> > > +float
> > > +test_gpr_constraint_fpr_register (void)
> > > +{
> > > + register float x3 __asm__ (FPR);
> > > + __asm__ ("" : "="CSTR_GPR (x3));
> > > + return x3;
> > > +}
> > > --
> > > 2.53.0
> > >