Re: [PATCH] RISC-V: Handle VL-setting FoF loads. [PR123806]

钟居哲 Fri, 30 Jan 2026 00:34:04 -0800

Hi. Robin.

+static rtx
+get_fof_set_vl_reg (rtx_insn *rinsn)
+{
+  if (!fault_first_load_p (rinsn))
+    return NULL_RTX;
+
+  rtx pat = PATTERN (rinsn);
+  if (GET_CODE (pat) != PARALLEL)
+    return NULL_RTX;
+
+  if (XVECLEN (pat, 0) != 3)
+    return NULL_RTX;
+
+  rtx sub = XVECEXP (pat, 0, 2);
+  if (GET_CODE (sub) == SET
+      && GET_CODE (SET_SRC (sub)) == UNSPEC
+      && XINT (SET_SRC (sub), 1) == UNSPEC_READ_VL)
+    return SET_DEST (sub);
+
+  return NULL_RTX;
+}


Could we add additional attribute or something to simplify this check ?
> From: "Robin Dapp"<[email protected]>
> Date:  Wed, Jan 28, 2026, 23:55
> Subject:  [PATCH] RISC-V: Handle VL-setting FoF loads. [PR123806]
> To: "gcc-patches"<[email protected]>
> Cc: <[email protected]>, <[email protected]>, <[email protected]>, 
> <[email protected]>, <[email protected]>
> Hi,
> 
> For PR122869 I thought I fixed the issue of VL-spills clobbering
> explicit VL reads after fault-only-first (FoF) loads but it turns
> out the fix is insufficient.  Even though it avoided the original
> issue, we can still have spills that clobber VL before the read_vl
> RTL pattern.  That's mostly due to us hiding the VL data flow from
> the optimizers so a regular spill to memory can and will introduce
> a VL clobber.  In vsetvl we catch all the regular cases but not the
> FoF-load case of PR123806 and PR122869.
> 
> This patch adds specific FoF patterns that emit the same instruction but
> have a register-setting VL pattern inside the insn's PARALLEL.
> It serves as a marker for the vsetvl pass that can recognize that we
> clobber VL before reading its value.  In that case we now emit an
> explicit csrr ..,vl.
> 
> After vsetvl it's safe to emit the read_vls because at that point the
> VL dataflow has been established and we can be sure to not clobber VL
> anymore.
> 
> Thus, the main changes are:
>  - Unify read_vl si and di and make it an UNSPEC.  We don't optimize
>    it anyway so a unified one is easier to include in the new FoF
>    VL-setter variants.
>  - Introduce VL-setting variants of FoF loads and handle them like
>    read_vl()s in the vsetvl pass.
>  - Emit read_vl()s after vsetvl insertion is done.
> 
> What this doesn't get rid of is the XFAIL in ff-load-3.c that I
> introduced for PR122869.  The code is still "good" at -O1 and
> "bad" at -O2 upwards.
> 
> Regtested on rv64gcv_zvl512b.
> 
> Regards
>  Robin
> 
>         PR target/123806
> 
> gcc/ChangeLog:
> 
>         * config/riscv/riscv-string.cc (expand_rawmemchr): Use unified
>         vl_read.
>         (expand_strcmp): Ditto.
>         * config/riscv/riscv-vector-builtins-bases.cc:
>         * config/riscv/riscv-vector-builtins.cc 
> (function_expander::use_fof_load_insn):
>         Only emit the store and not the VL read.
>         * config/riscv/riscv-vsetvl.cc (get_fof_set_vl_reg): New
>         function.
>         (init_rtl_ssa): New wrapper.
>         (finish_rtl_ssa): Ditto.
>         (emit_fof_read_vls): Emit read_vl after each fault-only-first
>         load.
>         (pass_vsetvl::simple_vsetvl): Call emit_fof_read_vls ().
>         (pass_vsetvl::lazy_vsetvl): Ditto.
>         * config/riscv/vector-iterators.md: Add read_vl unspec.
>         * config/riscv/vector.md (read_vlsi): Unify.
>         (@read_vl<mode>): Ditto.
>         (read_vldi_zero_extend): Ditto.
>         (@pred_fault_load_set_vl<V_VLS:mode><P:mode>): New FoF variant
>         that saves VL in a register.
>         (@pred_fault_load_set_vl<VT:mode><P:mode>): Ditto.
> 
> gcc/testsuite/ChangeLog:
> 
>         * g++.target/riscv/rvv/base/pr123806.C: New test.
>         * g++.target/riscv/rvv/base/pr123808.C: New test.
>         * g++.target/riscv/rvv/base/pr123808-2.C: New test.
> ---
>  gcc/config/riscv/riscv-string.cc              |  10 +-
>  .../riscv/riscv-vector-builtins-bases.cc      |   5 +-
>  gcc/config/riscv/riscv-vector-builtins.cc     |  30 ++---
>  gcc/config/riscv/riscv-vsetvl.cc              | 115 ++++++++++++++++--
>  gcc/config/riscv/vector-iterators.md          |   1 +
>  gcc/config/riscv/vector.md                    |  80 ++++++++++--
>  .../g++.target/riscv/rvv/base/pr123806.C      |  25 ++++
>  .../g++.target/riscv/rvv/base/pr123808-2.C    |  51 ++++++++
>  .../g++.target/riscv/rvv/base/pr123808.C      |  50 ++++++++
>  9 files changed, 315 insertions(+), 52 deletions(-)
>  create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/pr123806.C
>  create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/pr123808-2.C
>  create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/pr123808.C
> 
> diff --git a/gcc/config/riscv/riscv-string.cc 
> b/gcc/config/riscv/riscv-string.cc
> index 3e7896b36fc..ad71a103edc 100644
> --- a/gcc/config/riscv/riscv-string.cc
> +++ b/gcc/config/riscv/riscv-string.cc
> @@ -1402,10 +1402,7 @@ expand_rawmemchr (machine_mode mode, rtx dst, rtx 
> haystack, rtx needle,
>                     riscv_vector::UNARY_OP, vlops);
>  
>    /* Read how far we read.  */
> -  if (Pmode == SImode)
> -    emit_insn (gen_read_vlsi (cnt));
> -  else
> -    emit_insn (gen_read_vldi_zero_extend (cnt));
> +  emit_insn (gen_read_vl (Pmode, cnt));
>  
>    /* Compare needle with haystack and store in a mask.  */
>    rtx eq = gen_rtx_EQ (mask_mode, gen_const_vec_duplicate (vmode, needle), 
> vec);
> @@ -1520,10 +1517,7 @@ expand_strcmp (rtx result, rtx src1, rtx src2, rtx 
> nbytes,
>      }
>  
>    /* Read the vl for the next pointer bump.  */
> -  if (Pmode == SImode)
> -    emit_insn (gen_read_vlsi (cnt));
> -  else
> -    emit_insn (gen_read_vldi_zero_extend (cnt));
> +  emit_insn (gen_read_vl (Pmode, cnt));
>  
>    if (with_length)
>      {
> diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
> b/gcc/config/riscv/riscv-vector-builtins-bases.cc
> index 0bb878f0122..525a622882a 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
> +++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
> @@ -1926,10 +1926,7 @@ public:
>  
>    rtx expand (function_expander &e) const override
>    {
> -    if (Pmode == SImode)
> -      emit_insn (gen_read_vlsi (e.target));
> -    else
> -      emit_insn (gen_read_vldi_zero_extend (e.target));
> +    emit_insn (gen_read_vl (Pmode, e.target));
>      return e.target;
>    }
>  };
> diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
> b/gcc/config/riscv/riscv-vector-builtins.cc
> index 63cf4d691e7..92f343c0044 100644
> --- a/gcc/config/riscv/riscv-vector-builtins.cc
> +++ b/gcc/config/riscv/riscv-vector-builtins.cc
> @@ -4912,24 +4912,24 @@ function_expander::use_fof_load_insn ()
>    tree arg = CALL_EXPR_ARG (exp, vl_dest_arg);
>  
>    /* Use a regular FoF load if the user does not want to store VL.  */
> -  insn_code icode = code_for_pred_fault_load (mode);
> -  rtx result = generate_insn (icode);
> -
> -  /* If user wants VL stored, emit a read_vl and store to memory.  */
> -  if (!integer_zerop (arg))
> +  if (integer_zerop (arg))
>      {
> -      rtx vl_reg = gen_reg_rtx (Pmode);
> -      if (Pmode == SImode)
> -        emit_insn (gen_read_vlsi (vl_reg));
> -      else
> -        emit_insn (gen_read_vldi_zero_extend (vl_reg));
> -
> -      rtx addr = expand_normal (arg);
> -      rtx mem = gen_rtx_MEM (Pmode, memory_address (Pmode, addr));
> -      emit_move_insn (mem, vl_reg);
> +      insn_code icode = code_for_pred_fault_load (mode);
> +      return generate_insn (icode);
>      }
>  
> -  return result;
> +  /* The VL-setting FoF load writes the new VL to VL_REG.
> +     Store it to memory.  */
> +  rtx vl_reg = gen_reg_rtx (Pmode);
> +  add_output_operand (Pmode, vl_reg);
> +  insn_code icode = code_for_pred_fault_load_set_vl (mode, Pmode);
> +  rtx res = generate_insn (icode);
> +
> +  rtx addr = expand_normal (arg);
> +  rtx mem = gen_rtx_MEM (Pmode, memory_address (Pmode, addr));
> +  emit_move_insn (mem, vl_reg);
> +
> +  return res;
>  }
>  
>  /* Use contiguous store INSN.  */
> diff --git a/gcc/config/riscv/riscv-vsetvl.cc 
> b/gcc/config/riscv/riscv-vsetvl.cc
> index 64fa809b801..e2ba8e1c3d1 100644
> --- a/gcc/config/riscv/riscv-vsetvl.cc
> +++ b/gcc/config/riscv/riscv-vsetvl.cc
> @@ -291,6 +291,87 @@ fault_first_load_p (rtx_insn *rinsn)
>               || get_attr_type (rinsn) == TYPE_VLSEGDFF);
>  }
>  
> +/* Return the VL output register from a fault-only-first load with VL
> +   output (pred_fault_load_set_vl pattern) if RINSN is such an insn
> +   or NULL_RTX otherwise.
> +   The pattern has: (set vl_output (unspec:P [(reg:SI VL_REGNUM)]
> +                                             UNSPEC_READ_VL))  */
> +static rtx
> +get_fof_set_vl_reg (rtx_insn *rinsn)
> +{
> +  if (!fault_first_load_p (rinsn))
> +    return NULL_RTX;
> +
> +  rtx pat = PATTERN (rinsn);
> +  if (GET_CODE (pat) != PARALLEL)
> +    return NULL_RTX;
> +
> +  if (XVECLEN (pat, 0) != 3)
> +    return NULL_RTX;
> +
> +  rtx sub = XVECEXP (pat, 0, 2);
> +  if (GET_CODE (sub) == SET
> +      && GET_CODE (SET_SRC (sub)) == UNSPEC
> +      && XINT (SET_SRC (sub), 1) == UNSPEC_READ_VL)
> +    return SET_DEST (sub);
> +
> +  return NULL_RTX;
> +}
> +
> +/* Initialize RTL SSA and related infrastructure for vsetvl analysis.  */
> +static void
> +init_rtl_ssa ()
> +{
> +  calculate_dominance_info (CDI_DOMINATORS);
> +  loop_optimizer_init (AVOID_CFG_MODIFICATIONS);
> +  connect_infinite_loops_to_exit ();
> +  df_analyze ();
> +  crtl->ssa = new function_info (cfun);
> +}
> +
> +/* Finalize RTL SSA and cleanup.  */
> +static void
> +finish_rtl_ssa ()
> +{
> +  free_dominance_info (CDI_DOMINATORS);
> +  loop_optimizer_finalize ();
> +  if (crtl->ssa->perform_pending_updates ())
> +    cleanup_cfg (0);
> +  delete crtl->ssa;
> +  crtl->ssa = nullptr;
> +}
> +
> +/* Emit read_vl instructions after fault-only-first loads that have
> +   a VL output register.
> +   This needs to happen last, i.e. when we made the VL dataflow
> +   explicit by inserting vsetvls.  */
> +
> +static void
> +emit_fof_read_vls ()
> +{
> +  basic_block bb;
> +  rtx_insn *rinsn;
> +
> +  FOR_EACH_BB_FN (bb, cfun)
> +    FOR_BB_INSNS (bb, rinsn)
> +      {
> +        if (!NONDEBUG_INSN_P (rinsn))
> +          continue;
> +
> +        rtx vl_dest = get_fof_set_vl_reg (rinsn);
> +        if (!vl_dest)
> +          continue;
> +
> +        if (dump_file)
> +          fprintf (dump_file,
> +                   "  Inserting read_vl after FoF insn %d into r%d\n",
> +                   INSN_UID (rinsn), REGNO (vl_dest));
> +
> +        rtx read_vl_pat = gen_read_vl (Pmode, vl_dest);
> +        emit_insn_after (read_vl_pat, rinsn);
> +      }
> +}
> +
>  /* Return true if the instruction is read vl instruction.  */
>  static bool
>  read_vl_insn_p (rtx_insn *rinsn)
> @@ -1186,6 +1267,13 @@ public:
>                  break;
>                }
>            }
> +        /* If no csrr found but this is a _set_vl style fault-only-first
> +           load, use the insn itself as the VL source.
> +           If we have two identical vector configs that just differ in
> +           AVL and the AVL is just "modified" by a read_vl we
> +           can consider them equal and elide the second one.  */
> +        if (!m_read_vl_insn && get_fof_set_vl_reg (insn->rtl ()))
> +          m_read_vl_insn = insn;
>        }
>    }
>  
> @@ -2420,13 +2508,7 @@ public:
>        m_avin (nullptr), m_avout (nullptr), m_kill (nullptr), m_antloc 
> (nullptr),
>        m_transp (nullptr), m_insert (nullptr), m_del (nullptr), m_edges 
> (nullptr)
>    {
> -    /* Initialization of RTL_SSA.  */
> -    calculate_dominance_info (CDI_DOMINATORS);
> -    loop_optimizer_init (AVOID_CFG_MODIFICATIONS);
> -    /* Create FAKE edges for infinite loops.  */
> -    connect_infinite_loops_to_exit ();
> -    df_analyze ();
> -    crtl->ssa = new function_info (cfun);
> +    init_rtl_ssa ();
>      m_vector_block_infos.safe_grow_cleared (last_basic_block_for_fn (cfun));
>      compute_probabilities ();
>      m_unknown_info.set_unknown ();
> @@ -2434,12 +2516,7 @@ public:
>  
>    void finish ()
>    {
> -    free_dominance_info (CDI_DOMINATORS);
> -    loop_optimizer_finalize ();
> -    if (crtl->ssa->perform_pending_updates ())
> -      cleanup_cfg (0);
> -    delete crtl->ssa;
> -    crtl->ssa = nullptr;
> +    finish_rtl_ssa ();
>  
>      if (m_reg_def_loc)
>        sbitmap_vector_free (m_reg_def_loc);
> @@ -3608,6 +3685,11 @@ pass_vsetvl::simple_vsetvl ()
>              }
>          }
>      }
> +
> +  if (dump_file)
> +    fprintf (dump_file, "\nEmit missing read_vl()s for fault-only-first "
> +             "loads\n");
> +  emit_fof_read_vls ();
>  }
>  
>  /* Lazy vsetvl insertion for optimize > 0. */
> @@ -3656,6 +3738,13 @@ pass_vsetvl::lazy_vsetvl ()
>               "\nPhase 4: Insert, modify and remove vsetvl insns.\n\n");
>    pre.emit_vsetvl ();
>  
> +  /* Phase 4b: Emit read_vl for fault-only-first loads with VL output
> +     register.  */
> +  if (dump_file)
> +    fprintf (dump_file, "\nPhase 4b: Emit missing read_vl()s for "
> +             "fault-only-first loads\n");
> +  emit_fof_read_vls ();
> +
>    /* Phase 5: Cleanup */
>    if (dump_file)
>      fprintf (dump_file, "\nPhase 5: Cleanup\n\n");
> diff --git a/gcc/config/riscv/vector-iterators.md 
> b/gcc/config/riscv/vector-iterators.md
> index 49b0619f6f0..b2383de8549 100644
> --- a/gcc/config/riscv/vector-iterators.md
> +++ b/gcc/config/riscv/vector-iterators.md
> @@ -79,6 +79,7 @@ (define_c_enum "unspec" [
>    UNSPEC_VCOMPRESS
>    UNSPEC_VLEFF
>    UNSPEC_MODIFY_VL
> +  UNSPEC_READ_VL
>  
>    UNSPEC_VFFMA
>  
> diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
> index 18d9c2b3346..1b5c2cbe93b 100644
> --- a/gcc/config/riscv/vector.md
> +++ b/gcc/config/riscv/vector.md
> @@ -8537,21 +8537,13 @@ (define_insn "@pred_compress<mode>"
>  ;; - 7.7. Unit-stride Fault-Only-First Loads
>  ;; 
> -------------------------------------------------------------------------------
>  
> -(define_insn "read_vlsi"
> -  [(set (match_operand:SI 0 "register_operand" "=r")
> -        (reg:SI VL_REGNUM))]
> +(define_insn "@read_vl<mode>"
> +  [(set (match_operand:P 0 "register_operand" "=r")
> +        (unspec:P [(reg:SI VL_REGNUM)] UNSPEC_READ_VL))]
>    "TARGET_VECTOR"
>    "csrr\t%0,vl"
>    [(set_attr "type" "rdvl")
> -   (set_attr "mode" "SI")])
> -
> -(define_insn "read_vldi_zero_extend"
> -  [(set (match_operand:DI 0 "register_operand" "=r")
> -        (zero_extend:DI (reg:SI VL_REGNUM)))]
> -  "TARGET_VECTOR && TARGET_64BIT"
> -  "csrr\t%0,vl"
> -  [(set_attr "type" "rdvl")
> -   (set_attr "mode" "DI")])
> +   (set_attr "mode" "<MODE>")])
>  
>  (define_insn "@pred_fault_load<mode>"
>    [(set (match_operand:V_VLS 0 "register_operand"              "=vd,    vd,  
>   vr,    vr")
> @@ -8581,6 +8573,37 @@ (define_insn "@pred_fault_load<mode>"
>    [(set_attr "type" "vldff")
>     (set_attr "mode" "<MODE>")])
>  
> +(define_insn "@pred_fault_load_set_vl<V_VLS:mode><P:mode>"
> +  [(set (match_operand:V_VLS 0 "register_operand"               "=  vd,    
> vd,    vr,    vr")
> +        (if_then_else:V_VLS
> +          (unspec:<V_VLS:VM>
> +            [(match_operand:<V_VLS:VM> 1 "vector_mask_operand" "   vm,    
> vm,   Wc1,   Wc1")
> +             (match_operand 4 "vector_length_operand"          "  rvl,   
> rvl,   rvl,   rvl")
> +             (match_operand 5 "const_int_operand"              "    i,     
> i,     i,     i")
> +             (match_operand 6 "const_int_operand"              "    i,     
> i,     i,     i")
> +             (match_operand 7 "const_int_operand"              "    i,     
> i,     i,     i")
> +             (reg:SI VL_REGNUM)
> +             (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
> +          (unspec:V_VLS
> +            [(match_operand:V_VLS 3 "memory_operand"           "    m,     
> m,     m,     m")] UNSPEC_VLEFF)
> +          (match_operand:V_VLS 2 "vector_merge_operand"        "   vu,     
> 0,    vu,     0")))
> +   (set (reg:SI VL_REGNUM)
> +          (unspec:SI
> +            [(if_then_else:V_VLS
> +               (unspec:<V_VLS:VM>
> +                [(match_dup 1) (match_dup 4) (match_dup 5)
> +                 (match_dup 6) (match_dup 7)
> +                 (reg:SI VL_REGNUM) (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
> +               (unspec:V_VLS [(match_dup 3)] UNSPEC_VLEFF)
> +               (match_dup 2))] UNSPEC_MODIFY_VL))
> +
> +   (set (match_operand:P 8 "register_operand"                       "=   r,  
>    r,     r,     r")
> +        (unspec:P [(reg:SI VL_REGNUM)] UNSPEC_READ_VL))]
> +  "TARGET_VECTOR"
> +  "vle<sew>ff.v\t%0,%3%p1"
> +  [(set_attr "type" "vldff")
> +   (set_attr "mode" "<V_VLS:MODE>")])
> +
>  
>  ;; 
> -------------------------------------------------------------------------------
>  ;; ---- Predicated Segment loads/stores
> @@ -8698,6 +8721,39 @@ (define_insn "@pred_fault_load<mode>"
>    [(set_attr "type" "vlsegdff")
>     (set_attr "mode" "<MODE>")])
>  
> +(define_insn "@pred_fault_load_set_vl<VT:mode><P:mode>"
> +  [(set (match_operand:VT 0 "register_operand"              "=  vr,    vr,   
>  vd")
> +        (if_then_else:VT
> +          (unspec:<VT:VM>
> +            [(match_operand:<VT:VM> 1 "vector_mask_operand" "vmWc1,   Wc1,   
>  vm")
> +             (match_operand 4 "vector_length_operand"       "  rvl,   rvl,   
> rvl")
> +             (match_operand 5 "const_int_operand"           "    i,     i,   
>   i")
> +             (match_operand 6 "const_int_operand"           "    i,     i,   
>   i")
> +             (match_operand 7 "const_int_operand"           "    i,     i,   
>   i")
> +             (reg:SI VL_REGNUM)
> +             (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
> +          (unspec:VT
> +            [(match_operand:VT 3 "memory_operand"            "    m,     m,  
>    m")
> +             (mem:BLK (scratch))] UNSPEC_VLEFF)
> +          (match_operand:VT 2 "vector_merge_operand"        "    0,    vu,   
>  vu")))
> +   (set (reg:SI VL_REGNUM)
> +        (unspec:SI
> +          [(if_then_else:VT
> +             (unspec:<VT:VM>
> +               [(match_dup 1) (match_dup 4) (match_dup 5)
> +                (match_dup 6) (match_dup 7)
> +                (reg:SI VL_REGNUM)
> +                (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
> +             (unspec:VT
> +                [(match_dup 3) (mem:BLK (scratch))] UNSPEC_VLEFF)
> +             (match_dup 2))] UNSPEC_MODIFY_VL))
> +   (set (match_operand:P 8 "register_operand"                    "=   r,     
> r,     r")
> +        (unspec:P [(reg:SI VL_REGNUM)] UNSPEC_READ_VL))]
> +  "TARGET_VECTOR"
> +  "vlseg<nf>e<sew>ff.v\t%0,%3%p1"
> +  [(set_attr "type" "vlsegdff")
> +   (set_attr "mode" "<VT:MODE>")])
> +
>  (define_insn "@pred_indexed_<order>load<V1T:mode><RATIO64I:mode>"
>    [(set (match_operand:V1T 0 "register_operand"           "=&vr,  &vr")
>          (if_then_else:V1T
> diff --git a/gcc/testsuite/g++.target/riscv/rvv/base/pr123806.C 
> b/gcc/testsuite/g++.target/riscv/rvv/base/pr123806.C
> new file mode 100644
> index 00000000000..b4c0d22a326
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/riscv/rvv/base/pr123806.C
> @@ -0,0 +1,25 @@
> +/* { dg-do run */
> +/* { dg-require-effective-target riscv_v_ok } */
> +/* { dg-add-options riscv_v } */
> +
> +#include <riscv_vector.h>
> +#include <vector>
> +
> +int8_t a[5], d[5], c[5], b[5];
> +int main() {
> +  for (size_t e = 0, avl = 1; avl > 0;) {
> +    size_t f = __riscv_vsetvl_e8m1(avl);
> +    vint8m1_t g = __riscv_vle8_v_i8m1(&a[e], f);
> +    vint8mf2_t i = __riscv_vle8ff(
> +        __riscv_vlm_v_b16(std::vector<uint8_t>((f + 7) / 8, 5).data(), f),
> +        &b[e], &f, f);
> +    vint8m1_t j = __riscv_vle8_v_i8m1(&c[e], f);
> +    vint8m1_t k = __riscv_vredxor_tu(g, i, j, f);
> +    __riscv_vse8_v_i8m1(&d[e], k, f);
> +    avl -= f;
> +
> +    if (f != 1 && avl != 0)
> +      __builtin_abort ();
> +    break;
> +  }
> +}
> diff --git a/gcc/testsuite/g++.target/riscv/rvv/base/pr123808-2.C 
> b/gcc/testsuite/g++.target/riscv/rvv/base/pr123808-2.C
> new file mode 100644
> index 00000000000..c439b31800b
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/riscv/rvv/base/pr123808-2.C
> @@ -0,0 +1,51 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target riscv_v_ok } */
> +/* { dg-add-options riscv_v } */
> +/* { dg-additional-options "-O0" } */
> +
> +#include <riscv_vector.h>
> +#include <vector>
> +#define a 36
> +
> +uint8_t e[a], x[a];
> +int64_t f[a], g[a], l[a];
> +float j[a], k[a], m[a];
> +
> +int main() {
> +  for (int i = 0; i < a; ++i) { e[i]=1; g[i] = 86; x[i] = 86; }
> +  for (size_t n = 0, avl = a; avl;) {
> +    size_t o = __riscv_vsetvl_e64m8(avl);
> +    vuint8m1_t p = __riscv_vle8_v_u8m1(&e[n], o);
> +    vbool8_t q = __riscv_vmseq_vx_u8m1_b8(p, 1, o);
> +    vuint64m8_t r = __riscv_vsll_vx_u64m8(__riscv_vid_v_u64m8(o), 3, o);
> +    vint64m8_t s = __riscv_vluxei64_v_i64m8_tum(
> +        __riscv_vlm_v_b8(std::vector<uint8_t>(o + 7).data(), o),
> +        __riscv_vmv_v_x_i64m8(0, __riscv_vsetvlmax_e16m2()), &f[n], r, o);
> +    vuint32m4_t t = __riscv_vsll_vx_u32m4(__riscv_vid_v_u32m4(o), 3, o);
> +    vint64m8_t u = __riscv_vluxei32(&g[n], t, o);
> +    vbool8_t v = __riscv_vlm_v_b8(&x[n], o);
> +    __riscv_vle32ff_v_f32m4_mu(q, __riscv_vfmv_v_f_f32m4(0, 
> __riscv_vsetvlmax_e8m1()), &j[n], &o, o);
> +    vfloat32m1_t w = __riscv_vfmv_v_f_f32m1(0, __riscv_vsetvlmax_e32m1());
> +    vfloat32m1_t aa = __riscv_vle32_v_f32m1_tu(w, &k[n], o);
> +    s = __riscv_vcompress_vm_i64m8_tu(s, u, v, o);
> +    vfloat32mf2_t ab = __riscv_vlmul_trunc_v_f32m1_f32mf2(aa);
> +    vuint64m8_t ac = __riscv_vsll_vx_u64m8(__riscv_vid_v_u64m8(o), 3, o);
> +    __riscv_vsuxei64_v_i64m8(&l[n], ac, s, o);
> +    __riscv_vse32_v_f32mf2(&m[n], ab, o);
> +    avl -= o;
> +  }
> +
> +  /* Results are inconsistent between different VLENs.
> +     "n" never changes so we will always store into l[0...] with a length of
> +     "o".  What differs is "s".
> +     At zvl128b and zvl256b we have more than one loop iteration and
> +     "s" will be {86, 86, -1, -1} or {86, 86, 0, 0} depending on the
> +     tail/mask policy.
> +     At zvl512b there is only one iteration and s = {86, 86, 86, ...}.
> +     I cross checked with clang and this seems correct.
> +     Therefore only check l's fifth element.
> +     The actual PR is about fault-only-first loads and the wrong code
> +     caused element 5 to be incorrect as well.  */
> +  if (l[5] != 86)
> +    __builtin_abort ();
> +}
> diff --git a/gcc/testsuite/g++.target/riscv/rvv/base/pr123808.C 
> b/gcc/testsuite/g++.target/riscv/rvv/base/pr123808.C
> new file mode 100644
> index 00000000000..f3bce35ed0c
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/riscv/rvv/base/pr123808.C
> @@ -0,0 +1,50 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target riscv_v_ok } */
> +/* { dg-add-options riscv_v } */
> +
> +#include <riscv_vector.h>
> +#include <vector>
> +#define a 36
> +
> +uint8_t e[a], x[a];
> +int64_t f[a], g[a], l[a];
> +float j[a], k[a], m[a];
> +
> +int main() {
> +  for (int i = 0; i < a; ++i) { e[i]=1; g[i] = 86; x[i] = 86; }
> +  for (size_t n = 0, avl = a; avl;) {
> +    size_t o = __riscv_vsetvl_e64m8(avl);
> +    vuint8m1_t p = __riscv_vle8_v_u8m1(&e[n], o);
> +    vbool8_t q = __riscv_vmseq_vx_u8m1_b8(p, 1, o);
> +    vuint64m8_t r = __riscv_vsll_vx_u64m8(__riscv_vid_v_u64m8(o), 3, o);
> +    vint64m8_t s = __riscv_vluxei64_v_i64m8_tum(
> +        __riscv_vlm_v_b8(std::vector<uint8_t>(o + 7).data(), o),
> +        __riscv_vmv_v_x_i64m8(0, __riscv_vsetvlmax_e16m2()), &f[n], r, o);
> +    vuint32m4_t t = __riscv_vsll_vx_u32m4(__riscv_vid_v_u32m4(o), 3, o);
> +    vint64m8_t u = __riscv_vluxei32(&g[n], t, o);
> +    vbool8_t v = __riscv_vlm_v_b8(&x[n], o);
> +    __riscv_vle32ff_v_f32m4_mu(q, __riscv_vfmv_v_f_f32m4(0, 
> __riscv_vsetvlmax_e8m1()), &j[n], &o, o);
> +    vfloat32m1_t w = __riscv_vfmv_v_f_f32m1(0, __riscv_vsetvlmax_e32m1());
> +    vfloat32m1_t aa = __riscv_vle32_v_f32m1_tu(w, &k[n], o);
> +    s = __riscv_vcompress_vm_i64m8_tu(s, u, v, o);
> +    vfloat32mf2_t ab = __riscv_vlmul_trunc_v_f32m1_f32mf2(aa);
> +    vuint64m8_t ac = __riscv_vsll_vx_u64m8(__riscv_vid_v_u64m8(o), 3, o);
> +    __riscv_vsuxei64_v_i64m8(&l[n], ac, s, o);
> +    __riscv_vse32_v_f32mf2(&m[n], ab, o);
> +    avl -= o;
> +  }
> +
> +  /* Results are inconsistent between different VLENs.
> +     "n" never changes so we will always store into l[0...] with a length of
> +     "o".  What differs is "s".
> +     At zvl128b and zvl256b we have more than one loop iteration and
> +     "s" will be {86, 86, -1, -1} or {86, 86, 0, 0} depending on the
> +     tail/mask policy.
> +     At zvl512b there is only one iteration and s = {86, 86, 86, ...}.
> +     I cross checked with clang and this seems correct.
> +     Therefore only check l's fifth element.
> +     The actual PR is about fault-only-first loads and the wrong code
> +     caused element 5 to be incorrect as well.  */
> +  if (l[5] != 86)
> +    __builtin_abort ();
> +}
> -- 
> 2.52.0
>

Re: [PATCH] RISC-V: Handle VL-setting FoF loads. [PR123806]

Reply via email to