On Tue, May 19, 2026 at 08:12:35AM +0200, Richard Biener wrote:
> On Mon, May 18, 2026 at 4:33 PM Stefan Schulze Frielinghaus
> <[email protected]> wrote:
> >
> > From: Stefan Schulze Frielinghaus <[email protected]>
> >
> > Currently local register asm assignments materialize during expand into
> > assignments utilizing hard registers.  Since hard registers or more
> > precisely objects residing in hard registers are not tracked
> > individually, those are subject to be clobbered.  Well known and
> > documented are function calls which may clobber hard registers used for
> > register asm objects.  For example, compiling on aarch64
> >
> > register int x asm ("x0") = 0x123;
> > register int y asm ("x1") = *ptr;
> >
> > using address sanitizers results in
> >
> > x0:SI=0x123
> > x0:DI=r104:DI
> > call [`__asan_load4'] argc:0
> > x1:SI=[r104:DI]
> >
> > The implicit function call added by the address sanitizer clobbers
> > argument register x0 which was previously set for the register asm
> > object.
> >
> > With the advent of hard register constraints, this can be overcome.
> > Instead of expanding a register asm assignment directly into a hard
> > register assignment, keep the register asm object in a pseudo for as
> > long as possible and use a hard register constraint in Extended Asm
> > statements which ensures that the object is finally allocated the
> > respective hard register.  Since local register asm is supposed to have
> > an effect only for Extended Asm statements, this coincides with hard
> > register constraints which materialize for the respective insn.
> >
> > This patch adds the feature of rewriting local register asm into code
> > which exploits hard register constraints.  For example
> >
> > register int global asm ("r3");
> >
> > int foo (int x0)
> > {
> >   register int x asm ("r4") = x0;
> >   register int y asm ("r5");
> >
> >   asm ("bar\t%0,%1,%2" : "=r" (x) : "0" (x), "r" (global));
> >   x += 42;
> >   asm ("baz\t%0,%1" : "=r" (y) : "r" (x));
> >
> >   return y;
> > }
> >
> > is rewritten during gimplification into
> >
> > register int global asm ("r3");
> >
> > int foo (int x0)
> > {
> >   register int tmp asm ("r5");
> >   int x = x0;
> >   int y = tmp;
> >
> >   asm ("bar\t%0,%1,%2" : "={r4}" (x) : "0" (x), "r" (global));
> >   x += 42;
> >   asm ("baz\t%0,%1" : "={r5}" (y) : "{r4}" (x));
> >
> >   return y;
> > }
> >
> > Note, uninitialized register asm objects may be used as inputs.  Thus,
> > if naively translated into hard register constraints, this would
> > introduce reads from uninitialized objects which init-regs pass would
> > fix up which in turn would mean that once hard register constraints
> > materialize, respective registers would be zeroed (see comment in
> > gimplify.cc for more details).  This is solved by initializing every
> > uninitialized register asm object by a fresh register asm object
> > ensuring that it contains the respective register value.  Subsequent
> > passes remove dead stores in case those objects are eventually
> > initialized at later points or are used exclusively as output operands.
> > Therefore, in most cases, those temporary register asm objects won't
> > materialize.  This is not pretty at all but required in order to compile
> > real world applications as e.g. glibc for target powerpc64le.
> >
> > Hard register constraints are more strict in order to prevent subtle
> > bugs.  This in turn means that certain programs are not valid after
> > register asm demotion.  For example,
> >
> > register int x asm ("r5") = 42;
> > asm ("" : "+r" (x) : "r" (x));
> >
> > is rewritten into
> >
> > int x = 42;
> > asm ("" : "+{r5}" (x) : "{r5}" (x));
> >
> > Now, two inputs refer to the very same register which is invalid.  This
> > example could have been massaged to make it fit, however, there are
> > other examples which cannot.  Currently, I lean towards rejecting those
> > instead of fixing up, since those look like subtle bugs.
> 
> Hmm, but local hardregs are supposed to be only used by extended
> asm as a way to constrain inputs.  With hardreg constraints they should
> no longer be necessary.  So - shouldn't we take the more aggressive
> approach and diagnose them as being deprecated and point to
> hardreg constraints?  Can we even rewrite uses to hardreg constraints
> (by rewriting the asm regs into SSA which, I think, we currently avoid)?

The whole point of this patch is to automatically rewrite register asm
objects into ordinary objects utilizing hard register constraints so
that old code which is still depending on register asm profits from hard
register constraints.  However, for hard register constraints I have
been way more strict or in other words with register asm you have more
freedom.  Therefore, automatically translating those into hard register
constraints will certainly fail here and there.  You could argue that
the example from above could be massaged to make it fit with hard
register constraints.  However, I was questioning whether it is
worthwhile to implement this kind of logic since with register asm it is
easy to come up with code which cannot be massaged easily or even at all
since the semantics is up to my knowledge not clearly defined.  For
example

register int x asm ("r5") = 42;
register int y asm ("r5") = 24;
asm ("" : "=r" (x) : "r" (x), "r" (y));

This kind of code is accepted by gcc/clang at the moment.  My gut
feeling is that this shouldn't have been accepted and is rather a side
effect of the implementation but I might be completely wrong here.  That
being said with this patch, if flag -fdemote-register-asm is
specified, then the code is rewritten into

x = 42;
y = 24;
asm ("" : "={r5}" x : "{r5}" x, "{r5}" y);

for which we error out since we have two inputs bound to the very same
register.  Since all those examples look like subtle bugs to me, I think
it is better to diagnose those what the current implementation does
(although I think the error message could be improved here and there).

Cheers,
Stefan

> 
> Richard.
> 
> >
> > Since I consider this as an experimental feature it is hidden behind new
> > flag -fdemote-register-asm.
> > ---
> >
> > Notes:
> >     Patch v2 vs this one
> >     --------------------
> >
> >     Patch v2
> >     
> > https://inbox.sourceware.org/gcc-patches/[email protected]/
> >     was the last one I posted.
> >
> >     This patch keeps the behaviour if a register of a register asm object is
> >     not entailed in the register class of a corresponding constraint.
> >     Although, I would have rather liked to throw an error since this looks
> >     like a subtle bug to me, I kept this behaviour for the sake of
> >     compatibility.
> >
> >     Furthermore, this patch also deals with uninitialized reads of register
> >     asm objects.  Again, this rather sounds like a bug to me but for the
> >     sake of compatibility I kept this behaviour.
> >
> >     Testing
> >     -------
> >
> >     Bootstrapped on
> >     - aarch64-unknown-linux-gnu
> >     - powerpc64le-unknown-linux-gnu
> >     - s390x-ibm-linux-gnu
> >     - x86_64-pc-linux-gnu
> >
> >     Build and regtested glibc on
> >     - powerpc64le-unknown-linux-gnu
> >     - s390x-ibm-linux-gnu
> >     - x86_64-pc-linux-gnu
> >
> >     Build Linux on
> >     - aarch64-unknown-linux-gnu
> >     - s390x-ibm-linux-gnu
> >     - x86_64-pc-linux-gnu (*)
> >
> >     (*) For x86_64 the Linux kernel exposes a usage of register asm which
> >     leads to an error.   The usage pattern is similar to the one described
> >     in the commit message
> >
> >         register int x asm ("r5") = 42;
> >         asm ("" : "+r" (x) : "r" (x));
> >
> >     For testing purposes I replaced the in-out operand by an out operand and
> >     the Linux kernel compiles fine on x86_64.  See for more details:
> >     https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662725.html
> >
> >     Ok for mainline?
> >
> >  gcc/common.opt                                |   4 +
> >  gcc/gimplify.cc                               | 186 ++++++++++++++++++
> >  .../gcc.dg/asm-hard-reg-demotion-1.c          |  74 +++++++
> >  3 files changed, 264 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-demotion-1.c
> >
> > diff --git a/gcc/common.opt b/gcc/common.opt
> > index 3ad1444cc88..f0bc8492190 100644
> > --- a/gcc/common.opt
> > +++ b/gcc/common.opt
> > @@ -3588,6 +3588,10 @@ fverbose-asm
> >  Common Var(flag_verbose_asm)
> >  Add extra commentary to assembler output.
> >
> > +fdemote-register-asm
> > +Common Var(flag_demote_register_asm) Init(0)
> > +Demote local register asm and use hard register constraints instead.
> > +
> >  fvisibility=
> >  Common Joined RejectNegative Enum(symbol_visibility) 
> > Var(default_visibility) Init(VISIBILITY_DEFAULT)
> >  -fvisibility=[default|internal|hidden|protected]       Set the default 
> > symbol visibility.
> > diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
> > index e4db4b1d9bd..c6560c14f48 100644
> > --- a/gcc/gimplify.cc
> > +++ b/gcc/gimplify.cc
> > @@ -2246,6 +2246,40 @@ gimplify_decl_expr (tree *stmt_p, gimple_seq *seq_p)
> >               && clear_padding_type_may_have_padding_p (TREE_TYPE (decl)))
> >             gimple_add_padding_init_for_auto_var (decl, is_vla, seq_p);
> >         }
> > +      else if (flag_demote_register_asm && !init && DECL_HARD_REGISTER 
> > (decl))
> > +       {
> > +         /* Register asm objects may be used uninitialized as inputs.
> > +            Therefore, a naive translation into hard register constraints
> > +            would render the demoted objects to be default initialized by
> > +            init-regs and as a consequence, once hard register constraints
> > +            materialize for an Extended Asm, hard registers would be zeroed
> > +            which didn't happen before.  Therefore, to overcome this,
> > +            initialize each demoted object with the current contents of the
> > +            hard register it previously referred to.  For example, 
> > translate
> > +
> > +            register int x asm ("r5");
> > +            int y;
> > +            asm ("..." : "=r" (y) : "r" (x));
> > +
> > +            to
> > +
> > +            register int tmp asm ("r5");
> > +            int x = tmp;
> > +            int y;
> > +            asm ("..." : "=r" (y) : "{r5}" (x));
> > +
> > +            Do this unconditionally for all uninitialized register asm 
> > objects
> > +            and let subsequent passes remove dead stores in case those 
> > objects
> > +            are initialized at later points or are used exclusively as 
> > output
> > +            operands.  */
> > +         tree tmp = create_tmp_var (TREE_TYPE (decl), "regasm");
> > +         SET_DECL_ASSEMBLER_NAME (tmp, DECL_ASSEMBLER_NAME (decl));
> > +         DECL_REGISTER (tmp) = 1;
> > +         DECL_HARD_REGISTER (tmp) = 1;
> > +         DECL_INITIAL (decl) = tmp;
> > +         *stmt_p = stmt;
> > +         return gimplify_decl_expr (stmt_p, seq_p);
> > +       }
> >      }
> >
> >    return GS_ALL_DONE;
> > @@ -7976,6 +8010,137 @@ num_alternatives (const_tree link)
> >    return num + 1;
> >  }
> >
> > +static inline bool
> > +rclass_entails_registers (enum reg_class rclass, int regno, int nregs)
> > +{
> > +  for (int i = regno; i < regno + nregs ; ++i)
> > +    if (!TEST_HARD_REG_BIT (reg_class_contents[rclass], i))
> > +      return false;
> > +  return true;
> > +}
> > +
> > +/* Keep track of all register asm which have been replaced by hard register
> > +   constraints.  After all asm statements of a function have been 
> > processed,
> > +   demote those to ordinary objects.  */
> > +static hash_set<tree> demote_register_asm;
> > +
> > +/* Rewrite constraints of Extended Asm operands which refer to local 
> > register
> > +   asm objects into hard register constraints.  Also mark those objects to 
> > be
> > +   demoted from register asm objects to ordinary objects which is done
> > +   basically after gimplification of the function body.
> > +
> > +   For example, the following translation unit
> > +
> > +   register int global asm ("r3");
> > +
> > +   int foo (int x0)
> > +   {
> > +     register int x asm ("r4") = x0;
> > +     register int y asm ("r5");
> > +
> > +     asm ("..." : "=r" (x) : "0" (x), "r" (global));
> > +     x += 42;
> > +     asm ("..." : "=r" (y) : "r" (x));
> > +
> > +     return y;
> > +   }
> > +
> > +   is rewritten into
> > +
> > +   register int global asm ("r3");
> > +
> > +   int foo (int x0)
> > +   {
> > +     register int tmp asm ("r5");
> > +     int x = x0;
> > +     int y = tmp;
> > +
> > +     asm ("..." : "={r4}" (x) : "0" (x), "r" (global));
> > +     x += 42;
> > +     asm ("..." : "={r5}" (y) : "{r4}" (x));
> > +
> > +     return y;
> > +   }
> > +
> > +   Any local register asm which is not initialized at its declaration, is
> > +   implicitly initialized with the contents of the respective hard 
> > register.
> > +   See gimplify_decl_expr() for more details.  Ideally we would error out 
> > in
> > +   those cases, however, for the sake of compatibility keep the current
> > +   behaviour.
> > +
> > +   Note, only rewrite a constraint in case it entails the registers 
> > referred to
> > +   by the corresponding register asm object.  For example, assume that 
> > register
> > +   f5 is a floating-point register and is therefore not included in the
> > +   register class associated by constraint r.
> > +
> > +   register float x asm ("f5");
> > +   asm ("..." : "=r" (x));
> > +
> > +   Then the constraint is not altered.  However, the register asm object is
> > +   still demoted to an ordinary object which means we finally end up with
> > +
> > +   float x;
> > +   asm ("..." : "=r" (x));
> > +
> > +   Ideally we would error out here since this rather looks like a bug, 
> > however,
> > +   for the sake of compatibility, preserve the current behaviour of 
> > register
> > +   asm.  */
> > +
> > +static void
> > +gimplify_demote_register_asm (tree link)
> > +{
> > +  tree op = TREE_VALUE (link);
> > +  if (!VAR_P (op) || !DECL_HARD_REGISTER (op) || is_global_var (op))
> > +    return;
> > +  tree id = DECL_ASSEMBLER_NAME (op);
> > +  const char *regname = IDENTIFIER_POINTER (id);
> > +  ++regname;
> > +  int regno = decode_reg_name (regname);
> > +  if (regno < 0)
> > +    /* This indicates an error and we error out later on.  */
> > +    return;
> > +  /* Currently, fixed registers cannot be used for hard register 
> > constraints
> > +     which is why we skip those for the moment.  */
> > +  if (fixed_regs[regno])
> > +    return;
> > +  machine_mode mode = TYPE_MODE (TREE_TYPE (op));
> > +  int nregs = hard_regno_nregs (regno, mode);
> > +  const char *constraint
> > +    = TREE_STRING_POINTER (TREE_VALUE (TREE_PURPOSE (link)));
> > +  auto_vec<char, 64> constraint_new;
> > +  for (const char *p = constraint; *p; )
> > +    {
> > +      bool changed_p = false;
> > +      enum constraint_num cn = lookup_constraint (p);
> > +      enum reg_class rclass = reg_class_for_constraint (cn);
> > +      if (rclass != NO_REGS && rclass_entails_registers (rclass, regno, 
> > nregs))
> > +       {
> > +         /* At this point we have a constraint which entails all the 
> > registers
> > +            required by the register asm operand.  Therefore, rewrite the
> > +            constraint into a corresponding hard register constraint.  */
> > +         constraint_new.safe_push ('{');
> > +         size_t len = strlen (regname);
> > +         for (size_t i = 0; i < len; ++i)
> > +           constraint_new.safe_push (regname[i]);
> > +         constraint_new.safe_push ('}');
> > +         changed_p = true;
> > +       }
> > +
> > +      for (size_t len = CONSTRAINT_LEN (*p, p); len; len--, p++)
> > +       {
> > +         if (!changed_p)
> > +           constraint_new.safe_push (*p);
> > +         if (*p == '\0')
> > +           break;
> > +       }
> > +    }
> > +  constraint_new.safe_push ('\0');
> > +  unsigned int len = constraint_new.length ();
> > +  tree str = build_string (len, constraint_new.address ());
> > +  TREE_VALUE (TREE_PURPOSE (link)) = str;
> > +  demote_register_asm.add (op);
> > +}
> > +
> >  /* Gimplify the operands of an ASM_EXPR.  Input operands should be a gimple
> >     value; output operands should be a gimple lvalue.  */
> >
> > @@ -8372,6 +8537,20 @@ gimplify_asm_expr (tree *expr_p, gimple_seq *pre_p, 
> > gimple_seq *post_p)
> >    /* Do not add ASMs with errors to the gimple IL stream.  */
> >    if (ret != GS_ERROR)
> >      {
> > +      if (flag_demote_register_asm)
> > +       {
> > +         for (unsigned i = 0; i < vec_safe_length (outputs); ++i)
> > +           {
> > +             tree link = (*outputs)[i];
> > +             gimplify_demote_register_asm (link);
> > +           }
> > +         for (unsigned i = 0; i < vec_safe_length (inputs); ++i)
> > +           {
> > +             tree link = (*inputs)[i];
> > +             gimplify_demote_register_asm (link);
> > +           }
> > +       }
> > +
> >        stmt = gimple_build_asm_vec (TREE_STRING_POINTER (ASM_STRING (expr)),
> >                                    inputs, outputs, clobbers, labels);
> >
> > @@ -21874,6 +22053,13 @@ gimplify_body (tree fndecl, bool do_parms)
> >           }
> >      }
> >
> > +  for (auto op : demote_register_asm)
> > +    {
> > +      DECL_REGISTER (op) = 0;
> > +      DECL_HARD_REGISTER (op) = 0;
> > +    }
> > +  demote_register_asm.empty ();
> > +
> >    if ((flag_openacc || flag_openmp || flag_openmp_simd)
> >        && gimplify_omp_ctxp)
> >      {
> > diff --git a/gcc/testsuite/gcc.dg/asm-hard-reg-demotion-1.c 
> > b/gcc/testsuite/gcc.dg/asm-hard-reg-demotion-1.c
> > new file mode 100644
> > index 00000000000..851adb3af40
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/asm-hard-reg-demotion-1.c
> > @@ -0,0 +1,74 @@
> > +/* { dg-do compile { target aarch64*-*-* s390*-*-* x86_64-*-* } } */
> > +/* { dg-additional-options "-fdemote-register-asm -fdump-tree-gimple" } */
> > +/* { dg-additional-options "-msse2" { target x86_64-*-* } } */
> > +
> > +#if __aarch64__
> > +# define GPR "r5"
> > +# define FPR "d5"
> > +# define CSTR_GPR "r"
> > +# define CSTR_FPR "w"
> > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=\{r5\}\" x0\\);" 
> > 1 "gimple" { target aarch64-*-* } } } */
> > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=\{d5\}\" x1\\);" 
> > 1 "gimple" { target aarch64-*-* } } } */
> > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=w\" x2\\);" 1 
> > "gimple" { target aarch64-*-* } } } */
> > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=r\" x3\\);" 1 
> > "gimple" { target aarch64-*-* } } } */
> > +#elif __s390__
> > +# define GPR "r5"
> > +# define FPR "f5"
> > +# define CSTR_GPR "r"
> > +# define CSTR_FPR "f"
> > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=\{r5\}\" x0\\);" 
> > 1 "gimple" { target s390*-*-* } } } */
> > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=\{f5\}\" x1\\);" 
> > 1 "gimple" { target s390*-*-* } } } */
> > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=f\" x2\\);" 1 
> > "gimple" { target s390*-*-* } } } */
> > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=r\" x3\\);" 1 
> > "gimple" { target s390*-*-* } } } */
> > +#elif __x86_64__
> > +# define GPR "cx"
> > +# define FPR "xmm5"
> > +# define CSTR_GPR "r"
> > +# define CSTR_FPR "x"
> > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=\{cx\}\" x0\\);" 
> > 1 "gimple" { target x86_64-*-* } } } */
> > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=\{xmm5\}\" 
> > x1\\);" 1 "gimple" { target x86_64-*-* } } } */
> > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=x\" x2\\);" 1 
> > "gimple" { target x86_64-*-* } } } */
> > +/* { dg-final { scan-tree-dump-times "__asm__\\(\"\" : \"=r\" x3\\);" 1 
> > "gimple" { target x86_64-*-* } } } */
> > +#else
> > +# error unsupported target
> > +#endif
> > +
> > +/* Rewrite constraints into hard register constraints and demote register 
> > asm
> > +   objects into ordinary objects.  */
> > +
> > +int
> > +test_gpr_constraint_gpr_register (void)
> > +{
> > +  register int x0 __asm__ (GPR);
> > +  __asm__ ("" : "="CSTR_GPR (x0));
> > +  return x0;
> > +}
> > +
> > +float
> > +test_fpr_constraint_fpr_register (void)
> > +{
> > +  register float x1 __asm__ (FPR);
> > +  __asm__ ("" : "="CSTR_FPR (x1));
> > +  return x1;
> > +}
> > +
> > +/* The following two tests are unusual in the sense that the register is 
> > not
> > +   subsumed by the constraint.  Keep the current behaviour by not changing 
> > the
> > +   constraints and only demote the register asm objects into ordinary 
> > objects.
> > +   Erroring out would be probably better since this could be a subtle bug. 
> >  */
> > +
> > +int
> > +test_fpr_constraint_gpr_register (void)
> > +{
> > +  register int x2 __asm__ (GPR);
> > +  __asm__ ("" : "="CSTR_FPR (x2));
> > +  return x2;
> > +}
> > +
> > +float
> > +test_gpr_constraint_fpr_register (void)
> > +{
> > +  register float x3 __asm__ (FPR);
> > +  __asm__ ("" : "="CSTR_GPR (x3));
> > +  return x3;
> > +}
> > --
> > 2.53.0
> >

Reply via email to