On Mon, Jun 23, 2025 at 07:12:58PM -0600, Jeff Law wrote:
> 
> 
> On 5/20/25 1:22 AM, Stefan Schulze Frielinghaus wrote:
> > Currently a register asm already materializes during expand.  This
> > means, a hard register is allocated for the very first access of a
> > register asm as e.g. in an assignment.  As a consequence this might lead
> > to suboptimal register allocation if the assignment and the using asm
> > statement are spread far apart.  Even more problematic are function
> > calls in between register asm assignments and its using asm statement
> > since hard registers may be clobbered by a call.  The former may be
> > solved by pulling register asm assignments and asm statements close by.
> > However, the latter is not easily solved since sometimes function calls
> > are implicit.  For example
> > 
> > int
> > foo (int *x)
> > {
> >    register int y asm ("0") = 42;
> >    register int z asm ("1") = *x;
> >    asm ("bar\t%0,%1" : "+r" (z) : "r" (y));
> >    return z;
> > }
> > 
> > If compiled with address sanitizer, then a function call is introduced
> > for the memory load which in turn may interfer with the initialization
> > of register asm y.  Likewise, for some targets and configurations even
> > an operation like an addition may lead to an implicit library call.
> > 
> > In contrast hard register constraints materialize during register
> > allocation and therefore do not suffer from this, i.e., asm operands are
> > kept in pseudos until RA.  This patch adds the feature of rewriting
> > local register asm into code which exploits hard register constraints.
> > For example
> > 
> > register int global asm ("r3");
> > 
> > int foo (int x0)
> > {
> >    register int x asm ("r4") = x0;
> >    register int y asm ("r5");
> > 
> >    asm ("bar\t%0,%1,%2" : "=r" (x) : "0" (x), "r" (global));
> >    x += 42;
> >    asm ("baz\t%0,%1" : "=r" (y) : "r" (x));
> > 
> >    return y;
> > }
> > 
> > is rewritten during gimplification into
> > 
> > register int global asm ("r3");
> > 
> > int foo (int x0)
> > {
> >    int x = x0;
> >    int y;
> > 
> >    asm ("bar\t%0,%1,%2" : "={r4}" (x) : "0" (x), "r" (global));
> >    x += 42;
> >    asm ("baz\t%0,%1" : "={r5}" (y) : "{r4}" (x));
> > 
> >    return y;
> > }
> > 
> > The resulting code solely relies on hard register constraints modulo
> > global register asm.
> > 
> > Since I consider this as an experimental feature it is hidden behind new
> > flag -fdemote-register-asm (I'm open for other naming suggestions).
> > ---
> >   gcc/common.opt                                |  4 +
> >   gcc/gimplify.cc                               | 87 +++++++++++++++++++
> >   .../gcc.dg/asm-hard-reg-demotion-1.c          | 19 ++++
> >   .../gcc.dg/asm-hard-reg-demotion-2.c          | 19 ++++
> >   .../gcc.dg/asm-hard-reg-demotion-error-1.c    | 29 +++++++
> >   gcc/testsuite/gcc.dg/asm-hard-reg-demotion.h  | 52 +++++++++++
> >   6 files changed, 210 insertions(+)
> >   create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-demotion-1.c
> >   create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-demotion-2.c
> >   create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-demotion-error-1.c
> >   create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-demotion.h
> So essentially this is range shrinking if I'm reading it correctly and I
> would generally consider that a good thing, particularly WRT registers
> appearing in ASMs regardless of the precise form.
> 
> As you note, there are oddball cases where it's not obvious where a call
> might appear.  We don't support it anymore, but the mn102, IIRC did libcalls
> for certain extensions and those libcalls wheren't exposed as such.
> Similarly the PA has millicode calls for div/mod and perhaps other things I
> can't remember anymore.  I *think* they're represented as simple div/mod
> instructions in the IL (it's been decades since I worked on either of those
> targets, so if the exact details are wrong, forgive me)...
> 
> I don't know if the kernel uses ASMs in a way that would trigger this, but
> it might be a good testcase.  Similarly for glibc.

As written in the other reply for patch 2/4 I will do some more testing
including building glibc+kernel.  I will include the results in my
(final?!) post of this patch series.

> 
> I note that you don't do any demotion for clobbers, is that intentional?  Or
> am I mistaken on the implementation details?

I didn't do any demotion of clobbers since I didn't see any value in it.
If a clobbered register gets accidentally clobbered as e.g. by an
implicitly introduced call, I wouldn't mind.

Again, thanks for reviewing all this :)
Stefan


> 
> jeff
> 
> 
> 
> > 
> > diff --git a/gcc/common.opt b/gcc/common.opt
> > index 0e50305dde8..34570f7dc4a 100644
> > --- a/gcc/common.opt
> > +++ b/gcc/common.opt
> > @@ -3514,6 +3514,10 @@ fverbose-asm
> >   Common Var(flag_verbose_asm)
> >   Add extra commentary to assembler output.
> > +fdemote-register-asm
> > +Common Var(flag_demote_register_asm) Init(0)
> > +Demote local register asm and use hard register constraints instead.
> > +
> >   fvisibility=
> >   Common Joined RejectNegative Enum(symbol_visibility) 
> > Var(default_visibility) Init(VISIBILITY_DEFAULT)
> >   -fvisibility=[default|internal|hidden|protected]  Set the default symbol 
> > visibility.
> > diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
> > index 7f66d4f0be6..1bcd63db4ef 100644
> > --- a/gcc/gimplify.cc
> > +++ b/gcc/gimplify.cc
> > @@ -7859,6 +7859,72 @@ num_alternatives (const_tree link)
> >     return num + 1;
> >   }
> > +/* Keep track of all register asm which have been replaced by hard register
> > +   constraints.  After all asm statements of a function have been 
> > processed,
> > +   demote those to ordinary variables.  */
> > +static hash_set<tree> demote_register_asm;
> > +
> > +static void
> > +gimplify_demote_register_asm (tree link)
> > +{
> > +  tree op = TREE_VALUE (link);
> > +  if (!VAR_P (op) || !DECL_HARD_REGISTER (op) || is_global_var (op))
> > +    return;
> > +  tree id = DECL_ASSEMBLER_NAME (op);
> > +  const char *regname = IDENTIFIER_POINTER (id);
> > +  ++regname;
> > +  int regno = decode_reg_name (regname);
> > +  if (regno < 0)
> > +    /* This indicates an error and we error out later on.  */
> > +    return;
> > +  const char *constraint
> > +    = TREE_STRING_POINTER (TREE_VALUE (TREE_PURPOSE (link)));
> > +  auto_vec<char, 64> constraint_new;
> > +  for (const char *p = constraint; *p; )
> > +    {
> > +      bool pushed = false;
> > +      switch (*p)
> > +   {
> > +   case '+': case '=': case '%': case '?': case '!': case '*': case '&':
> > +   case '#': case '$': case '^': case '{': case 'E': case 'F': case 'G':
> > +   case 'H': case 's': case 'i': case 'n': case 'I': case 'J': case 'K':
> > +   case 'L': case 'M': case 'N': case 'O': case 'P': case ',': case '0':
> > +   case '1': case '2': case '3': case '4': case '5': case '6': case '7':
> > +   case '8': case '9': case '[': case '<': case '>': case 'g': case 'X':
> > +     break;
> > +
> > +   default:
> > +     if (!ISALPHA (*p))
> > +       break;
> > +     enum constraint_num cn = lookup_constraint (p);
> > +     enum reg_class rclass = reg_class_for_constraint (cn);
> > +     if (rclass != NO_REGS || insn_extra_address_constraint (cn))
> > +       {
> > +         constraint_new.safe_push ('{');
> > +         size_t len = strlen (regname);
> > +         for (size_t i = 0; i < len; ++i)
> > +           constraint_new.safe_push (regname[i]);
> > +         constraint_new.safe_push ('}');
> > +         pushed = true;
> > +       }
> > +     break;
> > +   }
> > +
> > +      for (size_t len = CONSTRAINT_LEN (*p, p); len; len--, p++)
> > +   {
> > +     if (!pushed)
> > +       constraint_new.safe_push (*p);
> > +     if (*p == '\0')
> > +       break;
> > +   }
> > +    }
> > +  constraint_new.safe_push ('\0');
> > +  unsigned int len = constraint_new.length ();
> > +  tree str = build_string (len, constraint_new.address ());
> > +  TREE_VALUE (TREE_PURPOSE (link)) = str;
> > +  demote_register_asm.add (op);
> > +}
> > +
> >   /* Gimplify the operands of an ASM_EXPR.  Input operands should be a 
> > gimple
> >      value; output operands should be a gimple lvalue.  */
> > @@ -8264,6 +8330,20 @@ gimplify_asm_expr (tree *expr_p, gimple_seq *pre_p, 
> > gimple_seq *post_p)
> >     /* Do not add ASMs with errors to the gimple IL stream.  */
> >     if (ret != GS_ERROR)
> >       {
> > +      if (flag_demote_register_asm)
> > +   {
> > +     for (unsigned i = 0; i < vec_safe_length (outputs); ++i)
> > +       {
> > +         tree link = (*outputs)[i];
> > +         gimplify_demote_register_asm (link);
> > +       }
> > +     for (unsigned i = 0; i < vec_safe_length (inputs); ++i)
> > +       {
> > +         tree link = (*inputs)[i];
> > +         gimplify_demote_register_asm (link);
> > +       }
> > +   }
> > +
> >         stmt = gimple_build_asm_vec (TREE_STRING_POINTER (ASM_STRING 
> > (expr)),
> >                                inputs, outputs, clobbers, labels);
> > @@ -21023,6 +21103,13 @@ gimplify_body (tree fndecl, bool do_parms)
> >       }
> >       }
> > +  for (auto op : demote_register_asm)
> > +    {
> > +      DECL_REGISTER (op) = 0;
> > +      DECL_HARD_REGISTER (op) = 0;
> > +    }
> > +  demote_register_asm.empty ();
> > +
> >     if ((flag_openacc || flag_openmp || flag_openmp_simd)
> >         && gimplify_omp_ctxp)
> >       {
> > diff --git a/gcc/testsuite/gcc.dg/asm-hard-reg-demotion-1.c 
> > b/gcc/testsuite/gcc.dg/asm-hard-reg-demotion-1.c
> > new file mode 100644
> > index 00000000000..541a66a8d05
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/asm-hard-reg-demotion-1.c
> > @@ -0,0 +1,19 @@
> > +/* { dg-do run { target aarch64*-*-* powerpc64*-*-* riscv64-*-* s390*-*-* 
> > x86_64-*-* } } */
> > +/* { dg-options "-O2 -fdemote-register-asm" } */
> > +
> > +#include "asm-hard-reg-demotion.h"
> > +
> > +int
> > +main (void)
> > +{
> > +  if (bar (0) != 0
> > +      || bar (1) != 1
> > +      || bar (2) != 2
> > +      || bar (32) != 32
> > +      || baz (0) != 0
> > +      || baz (1) != 1
> > +      || baz (2) != 2
> > +      || baz (32) != 32)
> > +    __builtin_abort ();
> > +  return 0;
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/asm-hard-reg-demotion-2.c 
> > b/gcc/testsuite/gcc.dg/asm-hard-reg-demotion-2.c
> > new file mode 100644
> > index 00000000000..3d216d440af
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/asm-hard-reg-demotion-2.c
> > @@ -0,0 +1,19 @@
> > +/* { dg-do run { target aarch64*-*-* powerpc64*-*-* riscv64-*-* s390*-*-* 
> > x86_64-*-* } } */
> > +/* { dg-options "-O2 -fno-demote-register-asm" } */
> > +
> > +#include "asm-hard-reg-demotion.h"
> > +
> > +int
> > +main (void)
> > +{
> > +  if (bar (0) != 42
> > +      || bar (1) != 42
> > +      || bar (2) != 42
> > +      || bar (32) != 42
> > +      || baz (0) != 0
> > +      || baz (1) != 1
> > +      || baz (2) != 2
> > +      || baz (32) != 32)
> > +    __builtin_abort ();
> > +  return 0;
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/asm-hard-reg-demotion-error-1.c 
> > b/gcc/testsuite/gcc.dg/asm-hard-reg-demotion-error-1.c
> > new file mode 100644
> > index 00000000000..097a6ff901d
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/asm-hard-reg-demotion-error-1.c
> > @@ -0,0 +1,29 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-fdemote-register-asm" } */
> > +
> > +/* Verify input operands.  */
> > +
> > +int
> > +test (int x)
> > +{
> > +  register int y __asm__ ("0") = x;
> > +
> > +  /* Preserve status quo for register asm and don't error out.  This must 
> > hold
> > +     true even if we replace register asm with hard register constraints.  
> > */
> > +  __asm__ ("" : "+r" (y) : "r" (y));
> > +  __asm__ ("" : "=r" (y) : "0" (y), "r" (y));
> > +  __asm__ ("" : "=r" (y) : "r" (y), "r" (y));
> > +
> > +  /* Be more strict for hard register constraints and error out.  */
> > +  __asm__ ("" : "+{0}" (x) : "{0}" (x)); /* { dg-error "multiple inputs to 
> > hard register" } */
> > +  __asm__ ("" : "={0}" (x) : "0" (x), "{0}" (x)); /* { dg-error "multiple 
> > inputs to hard register" } */
> > +  __asm__ ("" : "=r" (x) : "{0}" (x), "{0}" (x)); /* { dg-error "multiple 
> > inputs to hard register" } */
> > +
> > +  /* Still error out in case of a mixture of both constructs.  */
> > +  __asm__ ("" : "+{0}" (x) : "r" (y)); /* { dg-error "multiple inputs to 
> > hard register" } */
> > +  __asm__ ("" : "={0}" (x) : "0" (x), "r" (y)); /* { dg-error "multiple 
> > inputs to hard register" } */
> > +  __asm__ ("" : "+r" (y) : "{0}" (x)); /* { dg-error "multiple inputs to 
> > hard register" } */
> > +  __asm__ ("" : "=r" (y) : "0" (y), "{0}" (x)); /* { dg-error "multiple 
> > inputs to hard register" } */
> > +
> > +  return x + y;
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/asm-hard-reg-demotion.h 
> > b/gcc/testsuite/gcc.dg/asm-hard-reg-demotion.h
> > new file mode 100644
> > index 00000000000..6d72f622ce9
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/asm-hard-reg-demotion.h
> > @@ -0,0 +1,52 @@
> > +/* Pass parameter x in the first general argument register to the assembler
> > +   instruction.
> > +
> > +   In function bar we fail to do so because after the function call to foo,
> > +   variable argreg1 does not contain the value of x but rather 42 which got
> > +   passed to foo.  Thus, the function always returns 42.  In contrast in
> > +   function baz, variable x is saved over the function call and 
> > materializes in
> > +   the asm statement and therefore is returned.  */
> > +
> > +#if defined (__aarch64__)
> > +# define REG register int argreg1 __asm__ ("x0") = x;
> > +# define MOVE1 __asm__ ("mov\t%0,%1" : "=r" (out) : "r" (argreg1));
> > +# define MOVE2 __asm__ ("mov\t%0,%1" : "=r" (out) : "{x0}" (x));
> > +#elif defined (__powerpc__) || defined (__POWERPC__)
> > +# define REG register int argreg1 __asm__ ("r3") = x;
> > +# define MOVE1 __asm__ ("mr\t%0,%1" : "=r" (out) : "r" (argreg1));
> > +# define MOVE2 __asm__ ("mr\t%0,%1" : "=r" (out) : "{r3}" (x));
> > +#elif defined (__riscv)
> > +# define REG register int argreg1 __asm__ ("a0") = x;
> > +# define MOVE1 __asm__ ("mv\t%0,%1" : "=r" (out) : "r" (argreg1));
> > +# define MOVE2 __asm__ ("mv\t%0,%1" : "=r" (out) : "{a0}" (x));
> > +#elif defined (__s390__)
> > +# define REG register int argreg1 __asm__ ("r2") = x;
> > +# define MOVE1 __asm__ ("lr\t%0,%1" : "=r" (out) : "r" (argreg1));
> > +# define MOVE2 __asm__ ("lr\t%0,%1" : "=r" (out) : "{r2}" (x));
> > +#elif defined (__x86_64__)
> > +# define REG register int argreg1 __asm__ ("edi") = x;
> > +# define MOVE1 __asm__ ("mov\t%1,%0" : "=r" (out) : "r" (argreg1));
> > +# define MOVE2 __asm__ ("mov\t%1,%0" : "=r" (out) : "{edi}" (x));
> > +#endif
> > +
> > +__attribute__ ((noipa))
> > +int foo (int unused) { }
> > +
> > +int
> > +bar (int x)
> > +{
> > +  int out;
> > +  REG
> > +  foo (42);
> > +  MOVE1
> > +  return out;
> > +}
> > +
> > +int
> > +baz (int x)
> > +{
> > +  int out;
> > +  foo (42);
> > +  MOVE2
> > +  return out;
> > +}
> 

Reply via email to