On Tue, May 19, 2026 at 2:26 PM Stefan Schulze Frielinghaus
<[email protected]> wrote:
>
> On Tue, May 19, 2026 at 08:12:35AM +0200, Richard Biener wrote:
> > On Mon, May 18, 2026 at 4:33 PM Stefan Schulze Frielinghaus
> > <[email protected]> wrote:
> > >
> > > From: Stefan Schulze Frielinghaus <[email protected]>
> > >
> > > Currently local register asm assignments materialize during expand into
> > > assignments utilizing hard registers.  Since hard registers or more
> > > precisely objects residing in hard registers are not tracked
> > > individually, those are subject to be clobbered.  Well known and
> > > documented are function calls which may clobber hard registers used for
> > > register asm objects.  For example, compiling on aarch64
> > >
> > > register int x asm ("x0") = 0x123;
> > > register int y asm ("x1") = *ptr;
> > >
> > > using address sanitizers results in
> > >
> > > x0:SI=0x123
> > > x0:DI=r104:DI
> > > call [`__asan_load4'] argc:0
> > > x1:SI=[r104:DI]
> > >
> > > The implicit function call added by the address sanitizer clobbers
> > > argument register x0 which was previously set for the register asm
> > > object.
> > >
> > > With the advent of hard register constraints, this can be overcome.
> > > Instead of expanding a register asm assignment directly into a hard
> > > register assignment, keep the register asm object in a pseudo for as
> > > long as possible and use a hard register constraint in Extended Asm
> > > statements which ensures that the object is finally allocated the
> > > respective hard register.  Since local register asm is supposed to have
> > > an effect only for Extended Asm statements, this coincides with hard
> > > register constraints which materialize for the respective insn.
> > >
> > > This patch adds the feature of rewriting local register asm into code
> > > which exploits hard register constraints.  For example
> > >
> > > register int global asm ("r3");
> > >
> > > int foo (int x0)
> > > {
> > >   register int x asm ("r4") = x0;
> > >   register int y asm ("r5");
> > >
> > >   asm ("bar\t%0,%1,%2" : "=r" (x) : "0" (x), "r" (global));
> > >   x += 42;
> > >   asm ("baz\t%0,%1" : "=r" (y) : "r" (x));
> > >
> > >   return y;
> > > }
> > >
> > > is rewritten during gimplification into
> > >
> > > register int global asm ("r3");
> > >
> > > int foo (int x0)
> > > {
> > >   register int tmp asm ("r5");
> > >   int x = x0;
> > >   int y = tmp;
> > >
> > >   asm ("bar\t%0,%1,%2" : "={r4}" (x) : "0" (x), "r" (global));
> > >   x += 42;
> > >   asm ("baz\t%0,%1" : "={r5}" (y) : "{r4}" (x));
> > >
> > >   return y;
> > > }
> > >
> > > Note, uninitialized register asm objects may be used as inputs.  Thus,
> > > if naively translated into hard register constraints, this would
> > > introduce reads from uninitialized objects which init-regs pass would
> > > fix up which in turn would mean that once hard register constraints
> > > materialize, respective registers would be zeroed (see comment in
> > > gimplify.cc for more details).  This is solved by initializing every
> > > uninitialized register asm object by a fresh register asm object
> > > ensuring that it contains the respective register value.  Subsequent
> > > passes remove dead stores in case those objects are eventually
> > > initialized at later points or are used exclusively as output operands.
> > > Therefore, in most cases, those temporary register asm objects won't
> > > materialize.  This is not pretty at all but required in order to compile
> > > real world applications as e.g. glibc for target powerpc64le.
> > >
> > > Hard register constraints are more strict in order to prevent subtle
> > > bugs.  This in turn means that certain programs are not valid after
> > > register asm demotion.  For example,
> > >
> > > register int x asm ("r5") = 42;
> > > asm ("" : "+r" (x) : "r" (x));
> > >
> > > is rewritten into
> > >
> > > int x = 42;
> > > asm ("" : "+{r5}" (x) : "{r5}" (x));
> > >
> > > Now, two inputs refer to the very same register which is invalid.  This
> > > example could have been massaged to make it fit, however, there are
> > > other examples which cannot.  Currently, I lean towards rejecting those
> > > instead of fixing up, since those look like subtle bugs.
> >
> > Hmm, but local hardregs are supposed to be only used by extended
> > asm as a way to constrain inputs.  With hardreg constraints they should
> > no longer be necessary.  So - shouldn't we take the more aggressive
> > approach and diagnose them as being deprecated and point to
> > hardreg constraints?  Can we even rewrite uses to hardreg constraints
> > (by rewriting the asm regs into SSA which, I think, we currently avoid)?
>
> The whole point of this patch is to automatically rewrite register asm
> objects into ordinary objects utilizing hard register constraints so
> that old code which is still depending on register asm profits from hard
> register constraints.

Oops, I didn't look at the patch and infered a wrong idea about what it
does from the description.  It seems to be exactly doing what I was
suggesting.

> However, for hard register constraints I have
> been way more strict or in other words with register asm you have more
> freedom.  Therefore, automatically translating those into hard register
> constraints will certainly fail here and there.  You could argue that
> the example from above could be massaged to make it fit with hard
> register constraints.  However, I was questioning whether it is
> worthwhile to implement this kind of logic since with register asm it is
> easy to come up with code which cannot be massaged easily or even at all
> since the semantics is up to my knowledge not clearly defined.  For
> example
>
> register int x asm ("r5") = 42;
> register int y asm ("r5") = 24;
> asm ("" : "=r" (x) : "r" (x), "r" (y));

Uh ...

> This kind of code is accepted by gcc/clang at the moment.  My gut
> feeling is that this shouldn't have been accepted and is rather a side
> effect of the implementation but I might be completely wrong here.  That
> being said with this patch, if flag -fdemote-register-asm is
> specified, then the code is rewritten into
>
> x = 42;
> y = 24;
> asm ("" : "={r5}" x : "{r5}" x, "{r5}" y);
>
> for which we error out since we have two inputs bound to the very same
> register.  Since all those examples look like subtle bugs to me, I think
> it is better to diagnose those what the current implementation does
> (although I think the error message could be improved here and there).

Maybe we should, for extra clarity, name -fdemote-register-asm as
-fstrict-register-asm and document it to be eventually the default.  Could
we run the demotion analysis-only by default and diagnose cases like the
above?  Or is there no way to achieve this?

Thanks,
Richard.

> Cheers,
> Stefan
>
> >
> > Richard.
> >
> > >
> > > Since I consider this as an experimental feature it is hidden behind new
> > > flag -fdemote-register-asm.
> > > ---
> > >
> > > Notes:
> > >     Patch v2 vs this one
> > >     --------------------
> > >
> > >     Patch v2
> > >     
> > > https://inbox.sourceware.org/gcc-patches/[email protected]/
> > >     was the last one I posted.
> > >
> > >     This patch keeps the behaviour if a register of a register asm object 
> > > is
> > >     not entailed in the register class of a corresponding constraint.
> > >     Although, I would have rather liked to throw an error since this looks
> > >     like a subtle bug to me, I kept this behaviour for the sake of
> > >     compatibility.
> > >
> > >     Furthermore, this patch also deals with uninitialized reads of 
> > > register
> > >     asm objects.  Again, this rather sounds like a bug to me but for the
> > >     sake of compatibility I kept this behaviour.
> > >
> > >     Testing
> > >     -------
> > >
> > >     Bootstrapped on
> > >     - aarch64-unknown-linux-gnu
> > >     - powerpc64le-unknown-linux-gnu
> > >     - s390x-ibm-linux-gnu
> > >     - x86_64-pc-linux-gnu
> > >
> > >     Build and regtested glibc on
> > >     - powerpc64le-unknown-linux-gnu
> > >     - s390x-ibm-linux-gnu
> > >     - x86_64-pc-linux-gnu
> > >
> > >     Build Linux on
> > >     - aarch64-unknown-linux-gnu
> > >     - s390x-ibm-linux-gnu
> > >     - x86_64-pc-linux-gnu (*)
> > >
> > >     (*) For x86_64 the Linux kernel exposes a usage of register asm which
> > >     leads to an error.   The usage pattern is similar to the one described
> > >     in the commit message
> > >
> > >         register int x asm ("r5") = 42;
> > >         asm ("" : "+r" (x) : "r" (x));
> > >
> > >     For testing purposes I replaced the in-out operand by an out operand 
> > > and
> > >     the Linux kernel compiles fine on x86_64.  See for more details:
> > >     https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662725.html
> > >
> > >     Ok for mainline?
> > >
> > >  gcc/common.opt                                |   4 +
> > >  gcc/gimplify.cc                               | 186 ++++++++++++++++++
> > >  .../gcc.dg/asm-hard-reg-demotion-1.c          |  74 +++++++
> > >  3 files changed, 264 insertions(+)
> > >  create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-demotion-1.c
> > >
> > > diff --git a/gcc/common.opt b/gcc/common.opt
> > > index 3ad1444cc88..f0bc8492190 100644
> > > --- a/gcc/common.opt
> > > +++ b/gcc/common.opt
> > > @@ -3588,6 +3588,10 @@ fverbose-asm
> > >  Common Var(flag_verbose_asm)
> > >  Add extra commentary to assembler output.
> > >
> > > +fdemote-register-asm
> > > +Common Var(flag_demote_register_asm) Init(0)
> > > +Demote local register asm and use hard register constraints instead.
> > > +
> > >  fvisibility=
> > >  Common Joined RejectNegative Enum(symbol_visibility) 
> > > Var(default_visibility) Init(VISIBILITY_DEFAULT)
> > >  -fvisibility=[default|internal|hidden|protected]       Set the default 
> > > symbol visibility.
> > > diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
> > > index e4db4b1d9bd..c6560c14f48 100644
> > > --- a/gcc/gimplify.cc
> > > +++ b/gcc/gimplify.cc
> > > @@ -2246,6 +2246,40 @@ gimplify_decl_expr (tree *stmt_p, gimple_seq 
> > > *seq_p)
> > >               && clear_padding_type_may_have_padding_p (TREE_TYPE (decl)))
> > >             gimple_add_padding_init_for_auto_var (decl, is_vla, seq_p);
> > >         }
> > > +      else if (flag_demote_register_asm && !init && DECL_HARD_REGISTER 
> > > (decl))
> > > +       {
> > > +         /* Register asm objects may be used uninitialized as inputs.
> > > +            Therefore, a naive translation into hard register constraints
> > > +            would render the demoted objects to be default initialized by
> > > +            init-regs and as a consequence, once hard register 
> > > constraints
> > > +            materialize for an Extended Asm, hard registers would be 
> > > zeroed
> > > +            which didn't happen before.  Therefore, to overcome this,
> > > +            initialize each demoted object with the current contents of 
> > > the
> > > +            hard register it previously referred to.  For example, 
> > > translate
> > > +
> > > +            register int x asm ("r5");
> > > +            int y;
> > > +            asm ("..." : "=r" (y) : "r" (x));
> > > +
> > > +            to
> > > +
> > > +            register int tmp asm ("r5");
> > > +            int x = tmp;
> > > +            int y;
> > > +            asm ("..." : "=r" (y) : "{r5}" (x));
> > > +
> > > +            Do this unconditionally for all uninitialized register asm 
> > > objects
> > > +            and let subsequent passes remove dead stores in case those 
> > > objects
> > > +            are initialized at later points or are used exclusively as 
> > > output
> > > +            operands.  */
> > > +         tree tmp = create_tmp_var (TREE_TYPE (decl), "regasm");
> > > +         SET_DECL_ASSEMBLER_NAME (tmp, DECL_ASSEMBLER_NAME (decl));
> > > +         DECL_REGISTER (tmp) = 1;
> > > +         DECL_HARD_REGISTER (tmp) = 1;
> > > +         DECL_INITIAL (decl) = tmp;
> > > +         *stmt_p = stmt;
> > > +         return gimplify_decl_expr (stmt_p, seq_p);
> > > +       }
> > >      }
> > >
> > >    return GS_ALL_DONE;
> > > @@ -7976,6 +8010,137 @@ num_alternatives (const_tree link)
> > >    return num + 1;
> > >  }
> > >
> > > +static inline bool
> > > +rclass_entails_registers (enum reg_class rclass, int regno, int nregs)
> > > +{
> > > +  for (int i = regno; i < regno + nregs ; ++i)
> > > +    if (!TEST_HARD_REG_BIT (reg_class_contents[rclass], i))
> > > +      return false;
> > > +  return true;
> > > +}
> > > +
> > > +/* Keep track of all register asm which have been replaced by hard 
> > > register
> > > +   constraints.  After all asm statements of a function have been 
> > > processed,
> > > +   demote those to ordinary objects.  */
> > > +static hash_set<tree> demote_register_asm;
> > > +
> > > +/* Rewrite constraints of Extended Asm operands which refer to local 
> > > register
> > > +   asm objects into hard register constraints.  Also mark those objects 
> > > to be
> > > +   demoted from register asm objects to ordinary objects which is done
> > > +   basically after gimplification of the function body.
> > > +
> > > +   For example, the following translation unit
> > > +
> > > +   register int global asm ("r3");
> > > +
> > > +   int foo (int x0)
> > > +   {
> > > +     register int x asm ("r4") = x0;
> > > +     register int y asm ("r5");
> > > +
> > > +     asm ("..." : "=r" (x) : "0" (x), "r" (global));
> > > +     x += 42;
> > > +     asm ("..." : "=r" (y) : "r" (x));
> > > +
> > > +     return y;
> > > +   }
> > > +
> > > +   is rewritten into
> > > +
> > > +   register int global asm ("r3");
> > > +
> > > +   int foo (int x0)
> > > +   {
> > > +     register int tmp asm ("r5");
> > > +     int x = x0;
> > > +     int y = tmp;
> > > +
> > > +     asm ("..." : "={r4}" (x) : "0" (x), "r" (global));
> > > +     x += 42;
> > > +     asm ("..." : "={r5}" (y) : "{r4}" (x));
> > > +
> > > +     return y;
> > > +   }
> > > +
> > > +   Any local register asm which is not initialized at its declaration, is
> > > +   implicitly initialized with the contents of the respective hard 
> > > register.
> > > +   See gimplify_decl_expr() for more details.  Ideally we would error 
> > > out in
> > > +   those cases, however, for the sake of compatibility keep the current
> > > +   behaviour.
> > > +
> > > +   Note, only rewrite a constraint in case it entails the registers 
> > > referred to
> > > +   by the corresponding register asm object.  For example, assume that 
> > > register
> > > +   f5 is a floating-point register and is therefore not included in the
> > > +   register class associated by constraint r.
> > > +
> > > +   register float x asm ("f5");
> > > +   asm ("..." : "=r" (x));
> > > +
> > > +   Then the constraint is not altered.  However, the register asm object 
> > > is
> > > +   still demoted to an ordinary object which means we finally end up with
> > > +
> > > +   float x;
> > > +   asm ("..." : "=r" (x));
> > > +
> > > +   Ideally we would error out here since this rather looks like a bug, 
> > > however,
> > > +   for the sake of compatibility, preserve the current behaviour of 
> > > register
> > > +   asm.  */
> > > +
> > > +static void
> > > +gimplify_demote_register_asm (tree link)
> > > +{
> > > +  tree op = TREE_VALUE (link);
> > > +  if (!VAR_P (op) || !DECL_HARD_REGISTER (op) || is_global_var (op))
> > > +    return;
> > > +  tree id = DECL_ASSEMBLER_NAME (op);
> > > +  const char *regname = IDENTIFIER_POINTER (id);
> > > +  ++regname;
> > > +  int regno = decode_reg_name (regname);
> > > +  if (regno < 0)
> > > +    /* This indicates an error and we error out later on.  */
> > > +    return;
> > > +  /* Currently, fixed registers cannot be used for hard register 
> > > constraints
> > > +     which is why we skip those for the moment.  */
> > > +  if (fixed_regs[regno])
> > > +    return;
> > > +  machine_mode mode = TYPE_MODE (TREE_TYPE (op));
> > > +  int nregs = hard_regno_nregs (regno, mode);
> > > +  const char *constraint
> > > +    = TREE_STRING_POINTER (TREE_VALUE (TREE_PURPOSE (link)));
> > > +  auto_vec<char, 64> constraint_new;
> > > +  for (const char *p = constraint; *p; )
> > > +    {
> > > +      bool changed_p = false;
> > > +      enum constraint_num cn = lookup_constraint (p);
> > > +      enum reg_class rclass = reg_class_for_constraint (cn);
> > > +      if (rclass != NO_REGS && rclass_entails_registers (rclass, regno, 
> > > nregs))
> > > +       {
> > > +         /* At this point we have a constraint which entails all the 
> > > registers
> > > +            required by the register asm operand.  Therefore, rewrite the
> > > +            constraint into a corresponding hard register constraint.  */
> > > +         constraint_new.safe_push ('{');
> > > +         size_t len = strlen (regname);
> > > +         for (size_t i = 0; i < len; ++i)
> > > +           constraint_new.safe_push (regname[i]);
> > > +         constraint_new.safe_push ('}');
> > > +         changed_p = true;
> > > +       }
> > > +
> > > +      for (size_t len = CONSTRAINT_LEN (*p, p); len; len--, p++)
> > > +       {
> > > +         if (!changed_p)
> > > +           constraint_new.safe_push (*p);
> > > +         if (*p == '\0')
> > > +           break;
> > > +       }
> > > +    }
> > > +  constraint_new.safe_push ('\0');
> > > +  unsigned int len = constraint_new.length ();
> > > +  tree str = build_string (len, constraint_new.address ());
> > > +  TREE_VALUE (TREE_PURPOSE (link)) = str;
> > > +  demote_register_asm.add (op);
> > > +}
> > > +
> > >  /* Gimplify the operands of an ASM_EXPR.  Input operands should be a 
> > > gimple
> > >     value; output operands should be a gimple lvalue.  */
> > >
> > > @@ -8372,6 +8537,20 @@ gimplify_asm_expr (tree *expr_p, gimple_seq 
> > > *pre_p, gimple_seq *post_p)
> > >    /* Do not add ASMs with errors to the gimple IL stream.  */
> > >    if (ret != GS_ERROR)
> > >      {
> > > +      if (flag_demote_register_asm)
> > > +       {
> > > +         for (unsigned i = 0; i < vec_safe_length (outputs); ++i)
> > > +           {
> > > +             tree link = (*outputs)[i];
> > > +             gimplify_demote_register_asm (link);
> > > +           }
> > > +         for (unsigned i = 0; i < vec_safe_length (inputs); ++i)
> > > +           {
> > > +             tree link = (*inputs)[i];
> > > +             gimplify_demote_register_asm (link);
> > > +           }
> > > +       }
> > > +
> > >        stmt = gimple_build_asm_vec (TREE_STRING_POINTER (ASM_STRING 
> > > (expr)),
> > >                                    inputs, outputs, clobbers, labels);
> > >
> > > @@ -21874,6 +22053,13 @@ gimplify_body (tree fndecl, bool do_parms)
> > >           }
> > >      }
> > >
> > > +  for (auto op : demote_register_asm)
> > > +    {
> > > +      DECL_REGISTER (op) = 0;
> > > +      DECL_HARD_REGISTER (op) = 0;
> > > +    }
> > > +  demote_register_asm.empty ();
> > > +
> > >    if ((flag_openacc || flag_openmp || flag_openmp_simd)
> > >        && gimplify_omp_ctxp)
> > >      {
> > > diff --git a/gcc/testsuite/gcc.dg/asm-hard-reg-demotion-1.c 
> > > b/gcc/testsuite/gcc.dg/asm-hard-reg-demotion-1.c
> > > new file mode 100644
> > > index 00000000000..851adb3af40
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/asm-hard-reg-demotion-1.c
> > > @@ -0,0 +1,74 @@
> > > +/* { dg-do compile { target aarch64*-*-* s390*-*-* x86_64-*-* } } */
> > > +/* { dg-additional-options "-fdemote-register-asm -fdump-tree-gimple" } 
> > > */
> > > +/* { dg-additional-options "-msse2" { target x86_64-*-* } } */
> > > +
> > > +#if __aarch64__
> > > +# define GPR "r5"
> > > +# define FPR "d5"
> > > +# define CSTR_GPR "r"
> > > +# define CSTR_FPR "w"
> > > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=\{r5\}\" 
> > > x0\\);" 1 "gimple" { target aarch64-*-* } } } */
> > > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=\{d5\}\" 
> > > x1\\);" 1 "gimple" { target aarch64-*-* } } } */
> > > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=w\" x2\\);" 1 
> > > "gimple" { target aarch64-*-* } } } */
> > > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=r\" x3\\);" 1 
> > > "gimple" { target aarch64-*-* } } } */
> > > +#elif __s390__
> > > +# define GPR "r5"
> > > +# define FPR "f5"
> > > +# define CSTR_GPR "r"
> > > +# define CSTR_FPR "f"
> > > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=\{r5\}\" 
> > > x0\\);" 1 "gimple" { target s390*-*-* } } } */
> > > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=\{f5\}\" 
> > > x1\\);" 1 "gimple" { target s390*-*-* } } } */
> > > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=f\" x2\\);" 1 
> > > "gimple" { target s390*-*-* } } } */
> > > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=r\" x3\\);" 1 
> > > "gimple" { target s390*-*-* } } } */
> > > +#elif __x86_64__
> > > +# define GPR "cx"
> > > +# define FPR "xmm5"
> > > +# define CSTR_GPR "r"
> > > +# define CSTR_FPR "x"
> > > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=\{cx\}\" 
> > > x0\\);" 1 "gimple" { target x86_64-*-* } } } */
> > > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=\{xmm5\}\" 
> > > x1\\);" 1 "gimple" { target x86_64-*-* } } } */
> > > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=x\" x2\\);" 1 
> > > "gimple" { target x86_64-*-* } } } */
> > > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=r\" x3\\);" 1 
> > > "gimple" { target x86_64-*-* } } } */
> > > +#else
> > > +# error unsupported target
> > > +#endif
> > > +
> > > +/* Rewrite constraints into hard register constraints and demote 
> > > register asm
> > > +   objects into ordinary objects.  */
> > > +
> > > +int
> > > +test_gpr_constraint_gpr_register (void)
> > > +{
> > > +  register int x0 __asm__ (GPR);
> > > +  __asm__ ("" : "="CSTR_GPR (x0));
> > > +  return x0;
> > > +}
> > > +
> > > +float
> > > +test_fpr_constraint_fpr_register (void)
> > > +{
> > > +  register float x1 __asm__ (FPR);
> > > +  __asm__ ("" : "="CSTR_FPR (x1));
> > > +  return x1;
> > > +}
> > > +
> > > +/* The following two tests are unusual in the sense that the register is 
> > > not
> > > +   subsumed by the constraint.  Keep the current behaviour by not 
> > > changing the
> > > +   constraints and only demote the register asm objects into ordinary 
> > > objects.
> > > +   Erroring out would be probably better since this could be a subtle 
> > > bug.  */
> > > +
> > > +int
> > > +test_fpr_constraint_gpr_register (void)
> > > +{
> > > +  register int x2 __asm__ (GPR);
> > > +  __asm__ ("" : "="CSTR_FPR (x2));
> > > +  return x2;
> > > +}
> > > +
> > > +float
> > > +test_gpr_constraint_fpr_register (void)
> > > +{
> > > +  register float x3 __asm__ (FPR);
> > > +  __asm__ ("" : "="CSTR_GPR (x3));
> > > +  return x3;
> > > +}
> > > --
> > > 2.53.0
> > >

Reply via email to