Hi Claudiu,

Sorry again for the long delay in getting to this.

> -----Original Message-----
> From: [email protected] <claudiu.zissulescu-
> [email protected]>
> Sent: 05 November 2025 09:09
> To: [email protected]
> Cc: [email protected]; [email protected]; Tamar Christina
> <[email protected]>; Wilco Dijkstra <[email protected]>
> Subject: [PATCH v6 5/6] aarch64: Add support for memetag-stack sanitizer
> using MTE insns
> 
> From: Claudiu Zissulescu <[email protected]>
> 
> MEMTAG sanitizer, which is based on the HWASAN sanitizer, will invoke
> the target-specific hooks to create a random tag, add tag to memory
> address, and finally tag and untag memory.
> 
> Implement the target hooks to emit MTE instructions if MEMTAG sanitizer
> is in effect.  Continue to use the default target hook if HWASAN is
> being used.  Following target hooks are implemented:
>    - TARGET_MEMTAG_INSERT_RANDOM_TAG
>    - TARGET_MEMTAG_ADD_TAG
>    - TARGET_MEMTAG_EXTRACT_TAG
> 
> Apart from the target-specific hooks, set the following to values
> defined by the Memory Tagging Extension (MTE) in aarch64:
>    - TARGET_MEMTAG_TAG_SIZE
>    - TARGET_MEMTAG_GRANULE_SIZE
> 
> The next instructions were (re-)defined:
>    - addg/subg (used by TARGET_MEMTAG_ADD_TAG and
>      TARGET_MEMTAG_COMPOSE_OFFSET_TAG hooks)
>    - stg/st2g Used to tag/untag a memory granule.
>    - tag_memory A target specific instruction, it will will emit MTE
>      instructions to tag/untag memory of a given size.
>    - compose_tag A target specific instruction that computes a tagged
>      address as an offset from a base (tagged) address.
>    - gmi Used for randomizing the inserting tag.
>    - irg Likewise.
> 
> gcc/
> 
>       * config/aarch64/aarch64.md (addg): Update pattern to use
>       addg/subg instructions.
>       (stg): Update pattern.
>       (st2g): New pattern.
>       (tag_memory): Likewise.
>       (compose_tag): Likewise.
>       (irq): Update pattern to accept xzr register.
>       (gmi): Likewise.
>       (UNSPECV_TAG_SPACE): Define.
>       * config/aarch64/aarch64.cc (AARCH64_MEMTAG_GRANULE_SIZE):
>       Define.
>       (AARCH64_MEMTAG_TAG_BITSIZE): Likewise.
>       (AARCH64_MEMTAG_TAG_MEMORY_LOOP_THRESHOLD): Likewise.
>       (aarch64_override_options_internal): Error out if MTE instructions
>       are not available.
>       (aarch64_post_cfi_startproc): Emit .cfi_mte_tagged_frame.
>       (aarch64_can_tag_addresses): Add MEMTAG specific handling.
>       (aarch64_memtag_tag_bitsize): New function
>       (aarch64_memtag_granule_size): Likewise.
>       (aarch64_memtag_insert_random_tag): Likwise.
>       (aarch64_memtag_add_tag): Likewise.
>       (aarch64_memtag_extract_tag): Likewise.
>       (aarch64_granule16_memory_address_p): Likewise.
>       (aarch64_emit_stxg_insn): Likewise.
>       (aarch64_gen_tag_memory_postindex): Likewise.
>       (aarch64_memtag_tag_memory_via_loop): New definition.
>       (aarch64_expand_tag_memory): Likewise.
>       (aarch64_check_memtag_ops): Likewise.
>       (aarch64_gen_tag_memory_postindex): Likewise.
>       (TARGET_MEMTAG_TAG_SIZE): Define.
>       (TARGET_MEMTAG_GRANULE_SIZE): Likewise.
>       (TARGET_MEMTAG_INSERT_RANDOM_TAG): Likewise.
>       (TARGET_MEMTAG_ADD_TAG): Likewise.
>       (TARGET_MEMTAG_EXTRACT_TAG): Likewise.
>       * config/aarch64/aarch64-builtins.cc
>       (aarch64_expand_builtin_memtag): Update set tag builtin logic.
>       * config/aarch64/aarch64-linux.h: Pass memtag-stack sanitizer
>       specific options to the linker.
>       * config/aarch64/aarch64-protos.h
>       (aarch64_granule16_memory_address_p): New prototype.
>       (aarch64_check_memtag_ops): Likewise.
>       (aarch64_expand_tag_memory): Likewise.
>       * config/aarch64/constraints.md (Umg): New memory constraint.
>       (Uag): New constraint.
>       (Ung): Likewise.
>       * config/aarch64/predicates.md (aarch64_memtag_tag_offset):
>       Refactor it.
>       (aarch64_granule16_imm6): Rename from
> aarch64_granule16_uimm6 and
>       refactor it.
>       (aarch64_granule16_memory_operand): New constraint.
>       * config/aarch64/iterators.md (MTE_PP): New code iterator to be
>       used for mte instructions.
>       (stg_ops): New code attributes.
>       (st2g_ops): Likewise.
>       (mte_name): Likewise.
> 
> doc/
>         * invoke.texi: Update documentation.
> 
> gcc/testsuite:
> 
>       * gcc.target/aarch64/acle/memtag_1.c: Update test.
> 
> Co-authored-by: Indu Bhagat <[email protected]>
> Signed-off-by: Claudiu Zissulescu <[email protected]>
> 
> UPDATE: aarch64: Add support for memetag-stack sanitizer using MTE insns
> ---
>  gcc/config/aarch64/aarch64-builtins.cc        |   7 +-
>  gcc/config/aarch64/aarch64-linux.h            |   4 +-
>  gcc/config/aarch64/aarch64-protos.h           |   3 +
>  gcc/config/aarch64/aarch64.cc                 | 339 +++++++++++++++++-
>  gcc/config/aarch64/aarch64.md                 | 127 +++++--
>  gcc/config/aarch64/constraints.md             |  21 ++
>  gcc/config/aarch64/iterators.md               |  20 ++
>  gcc/config/aarch64/predicates.md              |  13 +-
>  gcc/doc/invoke.texi                           |   6 +-
>  .../gcc.target/aarch64/acle/memtag_1.c        |   4 +-
>  10 files changed, 504 insertions(+), 40 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64-builtins.cc
> b/gcc/config/aarch64/aarch64-builtins.cc
> index 408099a50e8..31431693cf2 100644
> --- a/gcc/config/aarch64/aarch64-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-builtins.cc
> @@ -3680,8 +3680,11 @@ aarch64_expand_builtin_memtag (int fcode, tree
> exp, rtx target)
>       pat = GEN_FCN (icode) (target, op0, const0_rtx);
>       break;
>        case AARCH64_MEMTAG_BUILTIN_SET_TAG:
> -     pat = GEN_FCN (icode) (op0, op0, const0_rtx);
> -     break;
> +     {
> +       rtx mem = gen_rtx_MEM (TImode, op0);
> +       pat = GEN_FCN (icode) (mem, op0);
> +       break;
> +     }
>        default:
>       gcc_unreachable();
>      }
> diff --git a/gcc/config/aarch64/aarch64-linux.h
> b/gcc/config/aarch64/aarch64-linux.h
> index 116bb4e69f3..4fa78e0b2f5 100644
> --- a/gcc/config/aarch64/aarch64-linux.h
> +++ b/gcc/config/aarch64/aarch64-linux.h
> @@ -48,7 +48,9 @@
>     %{static-pie:-Bstatic -pie --no-dynamic-linker -z text} \
>     -X                                                \
>     %{mbig-endian:-EB} %{mlittle-endian:-EL}     \
> -   -maarch64linux%{mabi=ilp32:32}%{mbig-endian:b}"
> +   -maarch64linux%{mabi=ilp32:32}%{mbig-endian:b} \
> +   %{%:sanitize(memtag-stack):%{!fsanitize-memtag-mode:-z memtag-stack -
> z memtag-mode=sync}} \
> +   %{%:sanitize(memtag-stack):%{fsanitize-memtag-mode=*:-z memtag-stack
> -z memtag-mode=%}}"
> 
> 
>  #define LINK_SPEC LINUX_TARGET_LINK_SPEC AARCH64_ERRATA_LINK_SPEC
> diff --git a/gcc/config/aarch64/aarch64-protos.h
> b/gcc/config/aarch64/aarch64-protos.h
> index a9e407ba340..a316e6af4aa 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -1127,6 +1127,9 @@ void aarch64_expand_sve_vec_cmp_float (rtx,
> rtx_code, rtx, rtx);
> 
>  bool aarch64_prepare_sve_int_fma (rtx *, rtx_code);
>  bool aarch64_prepare_sve_cond_int_fma (rtx *, rtx_code);
> +
> +bool aarch64_granule16_memory_address_p (rtx mem);
> +void aarch64_expand_tag_memory (rtx, rtx, rtx);
>  #endif /* RTX_CODE */
> 
>  bool aarch64_process_target_attr (tree);
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 9d2c3431ad3..e74e3b41887 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -19108,6 +19108,10 @@ aarch64_override_options_internal (struct
> gcc_options *opts)
>  #endif
>      }
> 
> +  if (flag_sanitize & SANITIZE_MEMTAG_STACK && !TARGET_MEMTAG)
> +    error ("%<-fsanitize=memtag-stack%> requires the ISA extension %qs",
> +        "memtag");
> +
>    aarch64_feature_flags isa_flags = aarch64_get_isa_flags (opts);
>    if ((isa_flags & (AARCH64_FL_SM_ON | AARCH64_FL_ZA_ON))
>        && !(isa_flags & AARCH64_FL_SME))
> @@ -25679,6 +25683,8 @@ aarch64_asm_output_external (FILE *stream,
> tree decl, const char* name)
>    aarch64_asm_output_variant_pcs (stream, decl, name);
>  }
> 
> +bool aarch64_can_tag_addresses (void);
> +

Can you just move this up instead of adding the prototype?

>  /* Triggered after a .cfi_startproc directive is emitted into the assembly 
> file.
>     Used to output the .cfi_b_key_frame directive when signing the current
>     function with the B key.  */
> @@ -25689,6 +25695,10 @@ aarch64_post_cfi_startproc (FILE *f, tree
> ignored ATTRIBUTE_UNUSED)
>    if (cfun->machine->frame.laid_out &&
> aarch64_return_address_signing_enabled ()
>        && aarch64_ra_sign_key == AARCH64_KEY_B)
>       asm_fprintf (f, "\t.cfi_b_key_frame\n");
> +  if (cfun->machine->frame.laid_out && aarch64_can_tag_addresses ()
> +      && memtag_sanitize_p ()
> +      && !known_eq (cfun->machine->frame.frame_size, 0))
> +    asm_fprintf (f, "\t.cfi_mte_tagged_frame\n");
>  }
> 
>  /* Implements TARGET_ASM_FILE_START.  Output the assembly header.  */
> @@ -30365,15 +30375,327 @@ aarch64_invalid_binary_op (int op
> ATTRIBUTE_UNUSED, const_tree type1,
>    return NULL;
>  }
> 
> +#define AARCH64_MEMTAG_GRANULE_SIZE  16
> +#define AARCH64_MEMTAG_TAG_BITSIZE    4
> +
>  /* Implement TARGET_MEMTAG_CAN_TAG_ADDRESSES.  Here we tell the rest
> of the
>     compiler that we automatically ignore the top byte of our pointers, which
> -   allows using -fsanitize=hwaddress.  */
> +   allows using -fsanitize=hwaddress.  In case of -fsanitize=memtag, we
> +   additionally ensure that target supports MEMTAG insns.  */
>  bool
>  aarch64_can_tag_addresses ()
>  {
> +  if (memtag_sanitize_p ())
> +    return !TARGET_ILP32 && TARGET_MEMTAG;
>    return !TARGET_ILP32;
>  }
> 
> +/* Implement TARGET_MEMTAG_TAG_BITSIZE.  */
> +unsigned char
> +aarch64_memtag_tag_bitsize ()
> +{
> +  if (memtag_sanitize_p ())
> +    return AARCH64_MEMTAG_TAG_BITSIZE;
> +  return default_memtag_tag_bitsize ();
> +}
> +
> +/* Implement TARGET_MEMTAG_GRANULE_SIZE.  */
> +unsigned char
> +aarch64_memtag_granule_size ()
> +{
> +  if (memtag_sanitize_p ())
> +    return AARCH64_MEMTAG_GRANULE_SIZE;
> +  return default_memtag_granule_size ();
> +}
> +
> +/* Implement TARGET_MEMTAG_INSERT_RANDOM_TAG.  In the case of MTE
> instructions,
> +   make sure the gmi and irg instructions are generated when
> +   -fsanitize=memtag-stack is used.  The first argument UNTAGGED can be a
> +   tagged pointer, and its tag is used in the exclusion set.  Thus, the 
> TARGET
> +   doesn't use the same tag.  */
> +rtx
> +aarch64_memtag_insert_random_tag (rtx untagged, rtx target)
> +{
> +  if (memtag_sanitize_p ())
> +    {
> +      insn_code icode = CODE_FOR_gmi;
> +      expand_operand ops_gmi[3];
> +      rtx tmp = gen_reg_rtx (Pmode);
> +      create_output_operand (&ops_gmi[0], tmp, Pmode);
> +      create_input_operand  (&ops_gmi[1], untagged, Pmode);
> +      create_integer_operand  (&ops_gmi[2], 0);
> +      expand_insn (icode, 3, ops_gmi);
> +
> +      icode = CODE_FOR_irg;
> +      expand_operand ops_irg[3];
> +      create_output_operand (&ops_irg[0], target, Pmode);
> +      create_input_operand  (&ops_irg[1], untagged, Pmode);
> +      create_input_operand  (&ops_irg[2], ops_gmi[0].value, Pmode);
> +      expand_insn (icode, 3, ops_irg);
> +      return ops_irg[0].value;
> +    }
> +  else
> +    return default_memtag_insert_random_tag (untagged, target);
> +}
> +
> +/* Implement TARGET_MEMTAG_ADD_TAG.  For memtag sanitizer, emit
> addg/subg
> +   instructions, otherwise fall back on the default implementation.  */
> +rtx
> +aarch64_memtag_add_tag (rtx base, poly_int64 offset, uint8_t tag_offset)
> +{
> +  if (memtag_sanitize_p ())
> +    {
> +      rtx target = NULL;
> +      poly_int64 addr_offset = offset;
> +      rtx offset_rtx = gen_int_mode (addr_offset, DImode);
> +
> +      if (!aarch64_granule16_imm6 (offset_rtx, DImode))
> +     {
> +       /* Emit addr arithmetic prior to addg/subg.  */
> +       base = expand_simple_binop (Pmode, PLUS, base, offset_rtx,
> +                                   NULL, true, OPTAB_LIB_WIDEN);
> +       addr_offset = 0;
> +     }
> +
> +      insn_code icode = CODE_FOR_addg;
> +      expand_operand ops[4];
> +      create_output_operand (&ops[0], target, DImode);
> +      create_input_operand (&ops[1], base, DImode);
> +      create_integer_operand (&ops[2], addr_offset);
> +      create_integer_operand (&ops[3], tag_offset);
> +      /* Addr offset and tag offset must be within bounds at this time.  */
> +      gcc_assert (aarch64_memtag_tag_offset (ops[3].value, DImode));
> +
> +      expand_insn (icode, 4, ops);
> +      return ops[0].value;
> +    }
> +  else
> +    return default_memtag_add_tag (base, offset, tag_offset);
> +}
> +
> +/* Implement TARGET_MEMTAG_EXTRACT_TAG.  In the case of memtag
> sanitizer, MTE
> +   instructions allows us to work with tag-address tuple, thus no need to
> +   extract the tag, emit a simple move.  */
> +rtx
> +aarch64_memtag_extract_tag (rtx tagged_pointer, rtx target)
> +{
> +
> +  if (memtag_sanitize_p ())
> +    {
> +      rtx ret = gen_reg_rtx (DImode);
> +      emit_move_insn (ret, gen_lowpart (DImode, tagged_pointer));
> +      return ret;
> +    }
> +  else
> +    return default_memtag_extract_tag (tagged_pointer, target);
> +}
> +
> +/* Return TRUE if x is a valid memory address form for memtag loads and
> +   stores.  */
> +bool
> +aarch64_granule16_memory_address_p (rtx x)
> +{
> +  struct aarch64_address_info addr;
> +
> +  if (!MEM_P (x)
> +      || !aarch64_classify_address (&addr, XEXP (x, 0), GET_MODE (x), false))
> +    return false;
> +
> +  /* Check that the offset, if any, is encodable as 9-bit immediate.  */
> +  switch (addr.type)
> +    {
> +    case ADDRESS_REG_IMM:
> +      return aarch64_granule16_simm9 (gen_int_mode (addr.const_offset,
> DImode),
> +                                   DImode);
> +
> +    case ADDRESS_REG_REG:
> +      return addr.shift == 0;
> +
> +    default:
> +      break;
> +    }
> +  return false;
> +}
> +
> +/* Helper to emit either stg or st2g instruction.  */
> +static void
> +aarch64_emit_stxg_insn (machine_mode mode, rtx nxt, rtx addr, rtx tagp)
> +{
> +  rtx mem_addr = gen_rtx_MEM (mode, nxt);
> +  rtvec vec = gen_rtvec (2, gen_rtx_MEM (mode, addr), tagp);
> +  rtx unspec = gen_rtx_UNSPEC_VOLATILE (mode, vec,
> UNSPECV_TAG_SPACE);
> +  emit_set_insn (mem_addr, unspec);
> +}

I guess this indirection is needed since stg and st2g are different named 
patterns.
And what's stopping unification here is the constraint modifier differences 
between
the two patterns? Which I'll come back to in a bit.

> +
> +/* Generate post-index stg or st2g based on whether ITER_INCR is worth one
> or
> +   two granules respectively.  */
> +static void
> +aarch64_gen_tag_memory_postindex (rtx addr, rtx tagged_pointer, int
> offset)
> +{
> +  machine_mode stgmode = TImode;
> +
> +  if (abs (offset) == (AARCH64_MEMTAG_GRANULE_SIZE * 2))
> +    stgmode = OImode;
> +  gcc_assert (abs (offset) == GET_MODE_SIZE (stgmode).to_constant ());
> +
> +  rtx next;
> +  if (offset < 0)
> +    next = gen_rtx_POST_DEC (Pmode, addr);
> +  else
> +    next = gen_rtx_POST_INC (Pmode, addr);
> +
> +  aarch64_emit_stxg_insn (stgmode, next, addr, tagged_pointer);
> +}
> +
> +/* Tag the memory via an explicit loop.  This is used when tag_memory
> expand
> +   is invoked for:
> +     - non-constant size, or
> +     - constant but not encodable size (!aarch64_granule16_simm9 ()), or
> +     - constant and encodable size (aarch64_granule16_simm9 ()), but over
> the
> +       unroll threshold (AARCH64_TAG_MEMORY_LOOP_THRESHOLD).  */
> +static void
> +aarch64_tag_memory_via_loop (rtx base, rtx size, rtx tagged_pointer)
> +{
> +  rtx_code_label *top_label, *bottom_label;
> +  machine_mode iter_mode;
> +  unsigned HOST_WIDE_INT len;
> +  unsigned HOST_WIDE_INT granule_size;
> +  unsigned HOST_WIDE_INT iters;
> +  rtx iter_limit = NULL_RTX;
> +  granule_size = (HOST_WIDE_INT) AARCH64_MEMTAG_GRANULE_SIZE;
> +  unsigned int factor = 1;
> +
> +  iter_mode = GET_MODE (size);
> +  if (iter_mode == VOIDmode)
> +    iter_mode = word_mode;
> +
> +  if (CONST_INT_P (size))
> +    {
> +      len = INTVAL (size);
> +      /* The amount of memory to tag must be aligned to granule size by now.
> */
> +      gcc_assert (abs_hwi (len) % granule_size == 0);
> +      iters = abs_hwi (len) / granule_size;
> +      /* Using st2g is always a faster way to tag/untag memory when compared
> +      to stg.  */
> +      if (iters % 2 == 0)
> +     factor = 2;
> +      iter_limit = GEN_INT (abs_hwi (len));

When can len be negative?

> +    }
> +  else
> +    iter_limit = size;
> +
> +  rtx x_addr = base;
> +
> +  /* Generate the following loop (stg example):
> +      mov     x8, #size
> +      cmp     x8
> +      ble     .L3
> +     .L2:
> +      stg     x3, [x3], #16
> +      subs    x8, x8, #16
> +      b.ne    .L2
> +     .L3:
> +      */
> +  int offset = granule_size * factor;
> +  rtx iter_incr = GEN_INT (offset);
> +  /* Emit ITER.  */
> +  rtx iter = gen_reg_rtx (iter_mode);
> +  emit_move_insn (iter, iter_limit);
> +
> +  /* Check if size is zero.  */
> +  bottom_label = gen_label_rtx ();
> +  if (!CONST_INT_P (size))

This just checks if size is a const int though, doesn't check for zero?
Think you're missing || CONST0_RTX (iter_mode) here?

> +    {
> +      rtx branch = aarch64_gen_compare_zero_and_branch (EQ, iter,
> bottom_label);
> +      aarch64_emit_unlikely_jump (branch);
> +    }
> +
> +  /* Prepare the addr operand for tagging memory.  */
> +  rtx addr_reg = gen_reg_rtx (Pmode);
> +  emit_move_insn (addr_reg, x_addr);
> +
> +  top_label = gen_label_rtx ();
> +  /* Emit top label.  */
> +  emit_label (top_label);
> +
> +  /* Tag Memory using post-index stg/st2g.  */
> +  aarch64_gen_tag_memory_postindex (addr_reg, tagged_pointer, offset);
> +

Ok so if iters  % 2 we do st2g storing two tags at a time and if not we do stg.
If there any reason we can't always use st2g in the loop and only enter the
loop if iters > 1. And then outside of the loop if iters % 2 != 0 use a single
stg?

> +  /* Decrement ITER by ITER_INCR.  */
> +  emit_insn (gen_subdi3_compare1_imm (iter, iter, iter_incr,
> +                                 GEN_INT (-UINTVAL (iter_incr))));
> +
> +  rtx cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
> +  rtx x = gen_rtx_fmt_ee (NE, CCmode, cc_reg, const0_rtx);
> +  auto jump = emit_jump_insn (gen_aarch64_bcond (x, cc_reg, top_label));
> +  JUMP_LABEL (jump) = top_label;
> +
> +  /* Emit bottom label.  */
> +  if (!CONST_INT_P (size))
> +    emit_label (bottom_label);
> +}
> +
> +/* Threshold in number of granules beyond which an explicit loop for
> +   tagging a memory block is emitted.  */
> +#define AARCH64_TAG_MEMORY_LOOP_THRESHOLD 10
> +

Think we might want this as a target param in aarch64.opt

> +/* Implement expand for tag_memory.  */
> +void
> +aarch64_expand_tag_memory (rtx base, rtx tagged_pointer, rtx size)
> +{
> +  rtx addr;
> +  HOST_WIDE_INT len, offset;
> +  unsigned HOST_WIDE_INT granule_size;
> +  unsigned HOST_WIDE_INT iters = 0;
> +
> +  granule_size = (HOST_WIDE_INT) AARCH64_MEMTAG_GRANULE_SIZE;
> +
> +  if (!REG_P (tagged_pointer))
> +    tagged_pointer = force_reg (Pmode, tagged_pointer);
> +
> +  if (!REG_P (base))
> +    base = force_reg (Pmode, base);
> +
> +  /* If size is small enough, I can can unroll the loop using stg/st2g
> +     instructions.  */
> +  if (CONST_INT_P (size))
> +    {
> +      len = INTVAL (size);
> +      if (len == 0)
> +     return; /* Nothing to do.  */
> +
> +      /* The amount of memory to tag must be aligned to granule size by now.
> */
> +      gcc_assert (abs_hwi (len) % granule_size == 0);
> +
> +      iters = abs_hwi (len) / granule_size;
> +    }
> +
> +  /* Check predicate on max offset possible: offset (in base rtx) + size.  */
> +  rtx end_addr = simplify_gen_binary (PLUS, Pmode, base, size);
> +  end_addr = gen_rtx_MEM (TImode, end_addr);
> +  if (iters > 0
> +      && iters <= AARCH64_TAG_MEMORY_LOOP_THRESHOLD
> +      && aarch64_granule16_memory_address_p (end_addr))
> +    {
> +      offset = 0;
> +      while (iters)
> +     {
> +       machine_mode mode = TImode;
> +       if (iters / 2)
> +         {
> +           mode = OImode;
> +           iters--;
> +         }
> +       iters--;
> +       addr = plus_constant (Pmode, base, offset);
> +       offset +=  GET_MODE_SIZE (mode).to_constant ();
> +       aarch64_emit_stxg_insn (mode, addr, addr, tagged_pointer);
> +     }
> +    }
> +  else
> +    aarch64_tag_memory_via_loop (base, size, tagged_pointer);
> +}
> +
>  /* Implement TARGET_ASM_FILE_END for AArch64.  This adds the AArch64
> GNU NOTE
>     section at the end if needed.  */
>  void
> @@ -32806,6 +33128,21 @@ aarch64_libgcc_floating_mode_supported_p
>  #undef TARGET_MEMTAG_CAN_TAG_ADDRESSES
>  #define TARGET_MEMTAG_CAN_TAG_ADDRESSES
> aarch64_can_tag_addresses
> 
> +#undef TARGET_MEMTAG_TAG_BITSIZE
> +#define TARGET_MEMTAG_TAG_BITSIZE aarch64_memtag_tag_bitsize
> +
> +#undef TARGET_MEMTAG_GRANULE_SIZE
> +#define TARGET_MEMTAG_GRANULE_SIZE aarch64_memtag_granule_size
> +
> +#undef TARGET_MEMTAG_INSERT_RANDOM_TAG
> +#define TARGET_MEMTAG_INSERT_RANDOM_TAG
> aarch64_memtag_insert_random_tag
> +
> +#undef TARGET_MEMTAG_ADD_TAG
> +#define TARGET_MEMTAG_ADD_TAG aarch64_memtag_add_tag
> +
> +#undef TARGET_MEMTAG_EXTRACT_TAG
> +#define TARGET_MEMTAG_EXTRACT_TAG aarch64_memtag_extract_tag
> +
>  #if CHECKING_P
>  #undef TARGET_RUN_TARGET_SELFTESTS
>  #define TARGET_RUN_TARGET_SELFTESTS selftest::aarch64_run_selftests
> diff --git a/gcc/config/aarch64/aarch64.md
> b/gcc/config/aarch64/aarch64.md
> index 98c65a74c8e..0b1e6e18ba5 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -412,6 +412,7 @@ (define_c_enum "unspecv" [
>      UNSPECV_GCSPOPM          ; Represent GCSPOPM.
>      UNSPECV_GCSSS1           ; Represent GCSSS1 Xt.
>      UNSPECV_GCSSS2           ; Represent GCSSS2 Xt.
> +    UNSPECV_TAG_SPACE                ; Represent MTE tag memory space.
>      UNSPECV_TSTART           ; Represent transaction start.
>      UNSPECV_TCOMMIT          ; Represent transaction commit.
>      UNSPECV_TCANCEL          ; Represent transaction cancel.
> @@ -8608,46 +8609,48 @@ (define_insn "aarch64_rndrrs"
>  ;; Memory Tagging Extension (MTE) instructions.
> 
>  (define_insn "irg"
> -  [(set (match_operand:DI 0 "register_operand" "=rk")
> +  [(set (match_operand:DI 0 "register_operand")
>       (ior:DI
> -      (and:DI (match_operand:DI 1 "register_operand" "rk")
> +      (and:DI (match_operand:DI 1 "register_operand")
>                (const_int MEMTAG_TAG_MASK))
> -      (ashift:DI (unspec:QI [(match_operand:DI 2 "register_operand" "r")]
> +      (ashift:DI (unspec:QI [(match_operand:DI 2 "aarch64_reg_or_zero")]
>                    UNSPEC_GEN_TAG_RND)
>                   (const_int 56))))]
>    "TARGET_MEMTAG"
> -  "irg\\t%0, %1, %2"
> -  [(set_attr "type" "memtag")]
> +  {@ [ cons: =0, 1, 2 ; attrs: type ]
> +     [ rk      , rk, r  ; memtag ] irg\\t%0, %1, %2
> +     [ rk      , rk, Z  ; memtag ] irg\\t%0, %1
> +  }
>  )
> 
>  (define_insn "gmi"
>    [(set (match_operand:DI 0 "register_operand" "=r")
> -     (ior:DI (ashift:DI
> -              (const_int 1)
> -              (and:QI (lshiftrt:DI
> -                       (match_operand:DI 1 "register_operand" "rk")
> -                       (const_int 56)) (const_int 15)))
> -             (match_operand:DI 2 "register_operand" "r")))]
> +     (ior:DI
> +      (unspec:DI [(match_operand:DI 1 "register_operand" "rk")
> +                  (const_int 0)]
> +                 UNSPEC_GEN_TAG)
> +      (match_operand:DI 2 "aarch64_reg_or_zero" "rZ")))]
>    "TARGET_MEMTAG"
> -  "gmi\\t%0, %1, %2"
> +  "gmi\\t%0, %1, %x2"
>    [(set_attr "type" "memtag")]
>  )
> 
>  (define_insn "addg"
> -  [(set (match_operand:DI 0 "register_operand" "=rk")
> +  [(set (match_operand:DI 0 "register_operand")
>       (ior:DI
> -      (and:DI (plus:DI (match_operand:DI 1 "register_operand" "rk")
> -                       (match_operand:DI 2 "aarch64_granule16_uimm6"
> "i"))
> -              (const_int -1080863910568919041)) ;; 0xf0ff...
> +      (and:DI (plus:DI (match_operand:DI 1 "register_operand")
> +                       (match_operand:DI 2 "aarch64_granule16_imm6"))
> +              (const_int MEMTAG_TAG_MASK))
>        (ashift:DI
> -       (unspec:QI
> -        [(and:QI (lshiftrt:DI (match_dup 1) (const_int 56)) (const_int 15))
> -         (match_operand:QI 3 "aarch64_memtag_tag_offset" "i")]
> -        UNSPEC_GEN_TAG)
> +           (unspec:DI [(match_dup 1)
> +                       (match_operand:QI 3 "aarch64_memtag_tag_offset")]
> +                       UNSPEC_GEN_TAG)
>         (const_int 56))))]
>    "TARGET_MEMTAG"
> -  "addg\\t%0, %1, #%2, #%3"
> -  [(set_attr "type" "memtag")]
> +  {@ [ cons: =0 , 1  , 2 , 3 ; attrs: type ]
> +     [ rk       , rk , Uag ,  ; memtag   ] addg\t%0, %1, #%2, #%3
> +     [ rk       , rk , Ung ,  ; memtag   ] subg\t%0, %1, #%n2, #%3
> +  }
>  )
> 
>  (define_insn "subp"
> @@ -8681,17 +8684,83 @@ (define_insn "ldg"
>  ;; STG doesn't align the address but aborts with alignment fault
>  ;; when the address is not 16-byte aligned.
>  (define_insn "stg"
> -  [(set (mem:QI (unspec:DI
> -      [(plus:DI (match_operand:DI 1 "register_operand" "rk")
> -                (match_operand:DI 2 "aarch64_granule16_simm9" "i"))]
> -      UNSPEC_TAG_SPACE))
> -     (and:QI (lshiftrt:DI (match_operand:DI 0 "register_operand" "rk")
> -                          (const_int 56)) (const_int 15)))]
> +  [(set (match_operand:TI 0 "aarch64_granule16_memory_operand"
> "+Umg")
> +      (unspec_volatile:TI
> +     [(match_dup 0)
> +      (match_operand:DI 1 "register_operand" "rk")]
> +     UNSPECV_TAG_SPACE))]
> +  "TARGET_MEMTAG"
> +  "stg\\t%1, %0"
> +  [(set_attr "type" "memtag")]
> +)
> +
> +(define_insn "stg_<mte_name>"
> +  [(set (mem:TI (MTE_PP:DI (match_operand:DI 0 "register_operand" "+r")))
> +     (unspec_volatile:TI
> +      [(mem:TI (match_dup 0))
> +       (match_operand:DI 1 "register_operand" "rk")]
> +      UNSPECV_TAG_SPACE))]
>    "TARGET_MEMTAG"
> -  "stg\\t%0, [%1, #%2]"
> +  "stg\\t%1, <stg_ops>"
>    [(set_attr "type" "memtag")]
>  )
> 
> +;; ST2G updates allocation tags for two memory granules (i.e. 32 bytes) at
> +;; once, without zero initialization.
> +(define_insn "st2g"
> +  [(set (match_operand:OI 0 "aarch64_granule16_memory_operand"
> "=Umg")

Should this not be + as well as with the rest? Since st2g also has a writeback 
form.

> +      (unspec_volatile:OI
> +     [(match_dup 0)
> +      (match_operand:DI 1 "register_operand" "rk")]
> +     UNSPECV_TAG_SPACE))]
> +  "TARGET_MEMTAG"
> +  "st2g\\t%1, %0"
> +  [(set_attr "type" "memtag")]
> +)
> +
> +(define_insn "st2g_<mte_name>"
> +  [(set (mem:OI (MTE_PP:DI (match_operand:DI 0 "register_operand" "+r")))
> +     (unspec_volatile:OI
> +      [(mem:OI (match_dup 0))
> +       (match_operand:DI 1 "register_operand" "rk")]
> +     UNSPECV_TAG_SPACE))]
> +  "TARGET_MEMTAG"
> +  "st2g\\t%1, <st2g_ops>"
> +  [(set_attr "type" "memtag")]
> +)
> +
> +(define_expand "tag_memory"
> +  [(match_operand:DI 0 "general_operand" "")
> +   (match_operand:DI 1 "nonmemory_operand" "")
> +   (match_operand:DI 2 "nonmemory_operand" "")]
> +  ""
> +  "
> +{
> +  aarch64_expand_tag_memory (operands[0], operands[1], operands[2]);
> +  DONE;
> +}")

Drop the " around the { and }. The { and } indicate it's C code so we don't 
need to
quote it as a string.

> +
> +(define_expand "compose_tag"
> +  [(set (match_operand:DI 0 "register_operand")
> +     (ior:DI
> +      (and:DI (plus:DI (match_operand:DI 1 "register_operand")
> +                       (const_int 0))
> +              (const_int MEMTAG_TAG_MASK))
> +      (ashift:DI
> +       (unspec:DI [(match_dup 1)
> +                  (match_operand 2 "immediate_operand")]
> +                  UNSPEC_GEN_TAG)
> +       (const_int 56))))]
> +  ""
> +  "
> +{
> +  if (INTVAL (operands[2]) == 0)
> +    {
> +     emit_move_insn (operands[0], operands[1]);
> +     DONE;
> +    }
> +}")

Same here.

Thanks for working on this,
Tamar

> +
>  ;; Load/Store 64-bit (LS64) instructions.
>  (define_insn "ld64b"
>    [(set (match_operand:V8DI 0 "register_operand" "=r")
> diff --git a/gcc/config/aarch64/constraints.md
> b/gcc/config/aarch64/constraints.md
> index 7b9e5583bc7..94d2ff4d847 100644
> --- a/gcc/config/aarch64/constraints.md
> +++ b/gcc/config/aarch64/constraints.md
> @@ -346,6 +346,12 @@ (define_memory_constraint "Ump"
>         (match_test "aarch64_legitimate_address_p (GET_MODE (op), XEXP (op,
> 0),
>                                                 true,
> ADDR_QUERY_LDP_STP)")))
> 
> +(define_memory_constraint "Umg"
> +  "@internal
> +  A memory address for MTE load/store tag operation."
> +  (and (match_code "mem")
> +       (match_test "aarch64_granule16_memory_address_p (op)")))
> +
>  ;; Used for storing or loading pairs in an AdvSIMD register using an STP/LDP
>  ;; as a vector-concat.  The address mode uses the same constraints as if it
>  ;; were for a single value.
> @@ -600,6 +606,21 @@ (define_address_constraint "Dp"
>   An address valid for a prefetch instruction."
>   (match_test "aarch64_address_valid_for_prefetch_p (op, true)"))
> 
> +(define_constraint "Uag"
> +  "@internal
> +  A constant that can be used as address offset for an ADDG operation."
> +  (and (match_code "const_int")
> +       (match_test "IN_RANGE (ival, 0, 1008)
> +                 && !(ival & 0xf)")))
> +
> +(define_constraint "Ung"
> +  "@internal
> +  A constant that can be used as address offset for an SUBG operation (once
> +  negated)."
> +  (and (match_code "const_int")
> +       (match_test "IN_RANGE (ival, -1008, -1)
> +                 && !(ival & 0xf)")))
> +
>  (define_constraint "vgb"
>    "@internal
>     A constraint that matches an immediate offset valid for SVE LD1B
> diff --git a/gcc/config/aarch64/iterators.md
> b/gcc/config/aarch64/iterators.md
> index 332e7ffd2ea..586c3bc3285 100644
> --- a/gcc/config/aarch64/iterators.md
> +++ b/gcc/config/aarch64/iterators.md
> @@ -2887,6 +2887,9 @@ (define_code_iterator SVE_UNPRED_FP_BINARY
> [plus minus mult])
>  ;; SVE integer comparisons.
>  (define_code_iterator SVE_INT_CMP [lt le eq ne ge gt ltu leu geu gtu])
> 
> +;; pre/post-{inc,dec} for mte instructions.
> +(define_code_iterator MTE_PP [post_inc post_dec pre_inc pre_dec])
> +
>  ;; -------------------------------------------------------------------
>  ;; Code Attributes
>  ;; -------------------------------------------------------------------
> @@ -3233,6 +3236,23 @@ (define_code_attr SVE_COND_FP [(plus
> "UNSPEC_COND_FADD")
>                              (minus "UNSPEC_COND_FSUB")
>                              (mult "UNSPEC_COND_FMUL")])
> 
> +;; Map MTE pre/post to the right asm format
> +(define_code_attr stg_ops [(post_inc "[%0], 16")
> +                        (post_dec "[%0], -16")
> +                        (pre_inc  "[%0, 16]!")
> +                        (pre_dec  "[%0, -16]!")])
> +
> +(define_code_attr st2g_ops [(post_inc "[%0], 32")
> +                         (post_dec "[%0], -32")
> +                         (pre_inc  "[%0, 32]!")
> +                         (pre_dec  "[%0, -32]!")])
> +
> +;; Map MTE pre/post to names
> +(define_code_attr mte_name [(post_inc "postinc")
> +                         (post_dec "postdec")
> +                         (pre_inc "preinc")
> +                         (pre_dec "predec")])
> +
>  ;; -------------------------------------------------------------------
>  ;; Int Iterators.
>  ;; -------------------------------------------------------------------
> diff --git a/gcc/config/aarch64/predicates.md
> b/gcc/config/aarch64/predicates.md
> index 42304cef439..dca0baf75e0 100644
> --- a/gcc/config/aarch64/predicates.md
> +++ b/gcc/config/aarch64/predicates.md
> @@ -1066,13 +1066,20 @@ (define_predicate
> "aarch64_bytes_per_sve_vector_operand"
>         (match_test "known_eq (wi::to_poly_wide (op, mode),
>                             BYTES_PER_SVE_VECTOR)")))
> 
> +;; The uimm4 field is a 4-bit field that only accepts immediates in the
> +;; range 0..15.
>  (define_predicate "aarch64_memtag_tag_offset"
>    (and (match_code "const_int")
> -       (match_test "IN_RANGE (INTVAL (op), 0, 15)")))
> +       (match_test "UINTVAL (op) <= 15")))
> +
> +(define_predicate "aarch64_granule16_memory_operand"
> +  (and (match_test "TARGET_MEMTAG")
> +       (match_code "mem")
> +       (match_test "aarch64_granule16_memory_address_p (op)")))
> 
> -(define_predicate "aarch64_granule16_uimm6"
> +(define_predicate "aarch64_granule16_imm6"
>    (and (match_code "const_int")
> -       (match_test "IN_RANGE (INTVAL (op), 0, 1008)
> +       (match_test "IN_RANGE (INTVAL (op), -1008, 1008)
>                   && !(INTVAL (op) & 0xf)")))
> 
>  (define_predicate "aarch64_granule16_simm9"
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 0bc22695931..1e05a32a1fc 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -18396,8 +18396,10 @@ for a list of supported options.
>  The option cannot be combined with @option{-fsanitize=thread} or
>  @option{-fsanitize=hwaddress}.  Note that the only targets
>  @option{-fsanitize=hwaddress} is currently supported on are x86-64
> -(only with @code{-mlam=u48} or @code{-mlam=u57} options) and
> AArch64,
> -in both cases only in ABIs with 64-bit pointers.
> +(only with @code{-mlam=u48} or @code{-mlam=u57} options) and
> AArch64, in both
> +cases only in ABIs with 64-bit pointers.  Similarly,
> +@option{-fsanitize=memtag-stack} is currently only supported on AArch64
> ABIs
> +with 64-bit pointers.
> 
>  When compiling with @option{-fsanitize=address}, you should also
>  use @option{-g} to produce more meaningful output.
> diff --git a/gcc/testsuite/gcc.target/aarch64/acle/memtag_1.c
> b/gcc/testsuite/gcc.target/aarch64/acle/memtag_1.c
> index f8368690032..e94a2220fe3 100644
> --- a/gcc/testsuite/gcc.target/aarch64/acle/memtag_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/acle/memtag_1.c
> @@ -54,9 +54,9 @@ test_memtag_6 (void *p)
>    __arm_mte_set_tag (p);
>  }
> 
> -/* { dg-final { scan-assembler-times {irg\tx..?, x..?, x..?\n} 1 } } */
> +/* { dg-final { scan-assembler-times {irg\tx..?, x..?\n} 1 } } */
>  /* { dg-final { scan-assembler-times {gmi\tx..?, x..?, x..?\n} 1 } } */
>  /* { dg-final { scan-assembler-times {subp\tx..?, x..?, x..?\n} 1 } } */
>  /* { dg-final { scan-assembler-times {addg\tx..?, x..?, #0, #1\n} 1 } } */
>  /* { dg-final { scan-assembler-times {ldg\tx..?, \[x..?, #0\]\n} 1 } } */
> -/* { dg-final { scan-assembler-times {stg\tx..?, \[x..?, #0\]\n} 1 } } */
> \ No newline at end of file
> +/* { dg-final { scan-assembler-times {stg\tx..?, \[x..?\]\n} 1 } } */
> --
> 2.51.0

Reply via email to