Hi, Richard.

Assertion failed at this IR: 

  _427 = _425 & _426;
  _429 = present$0_16(D) != 0;
  _430 = _425 & _429;
  _409 = _430 | _445;
  _410 = _409 | _449;
  _411 = .LOOP_VECTORIZED (3, 6);
  if (_411 != 0)
    goto <bb 267>; [100.00%]
  else
    goto <bb 268>; [100.00%]

  <bb 267> [local count: 3280550]:

  <bb 83> [local count: 29823181]:
....
    pretmp_56 = .MASK_LOAD (_293, 32B, _427);   ---> cause assertion failed.

You can take a look this ifcvt IR and search 'pretmp_56 = .MASK_LOAD (_293, 
32B, _427);'
RVV is totally the same IR as ARM SVE:
https://godbolt.org/z/rPbzfExWP 

I was struggling at this issue for a few days but failed to figure out why.
I am sorry about that. Could you help me with that?

And I adjust the code as follows:
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index a9200767f67..42f85839c6e 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -9843,10 +9843,17 @@ vectorizable_load (vec_info *vinfo,
       mask_index = internal_fn_mask_index (ifn);
       if (mask_index >= 0 && slp_node)
        mask_index = vect_slp_child_index_for_operand (call, mask_index);
+      slp_tree slp_op = NULL;
       if (mask_index >= 0
          && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
-                                     &mask, NULL, &mask_dt, &mask_vectype))
+                                     &mask,
+                                     ifn == IFN_MASK_LEN_GATHER_LOAD ? &slp_op
+                                                                     : NULL,
+                                     &mask_dt, &mask_vectype))
        return false;
+      if (mask_index >= 0 && slp_node
+         && !vect_maybe_update_slp_op_vectype (slp_op, mask_vectype))
+       gcc_unreachable ();
     }

As you can see, except MASK_LEN_GATHER_LOAD, other LOADs, I pass 'NULL' same as 
before.
Only MASK_LEN_GATHER_LOAD passes '&slp_op'. It works fine for RVV. But I don't 
think it's a correct code, we may need to find another solution.

Thanks.



juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-10-20 06:19
To: juzhe.zhong\@rivai.ai
CC: gcc-patches; rguenther
Subject: Re: [PATCH V5] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
"juzhe.zh...@rivai.ai" <juzhe.zh...@rivai.ai> writes:
> Hi, this patch fix V4 issue:
>
> Previously as Richard S commented:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633178.html 
>
> slp_op and mask_vectype are only initialised when mask_index >= 0.
> Shouldn't this code be under mask_index >= 0 too?
> Also, when do we encounter mismatched mask_vectypes?  Presumably the SLP
> node has a known vectype by this point.  I think a comment would be useful.
>
> Since I didn't encounter mismatched case in the regression of RISC-V and X86, 
> so 
> I fix it in V4 patch as follows:
> +      if (mask_index >= 0 && slp_node)
> + {
> +   bool match_p
> +     = vect_maybe_update_slp_op_vectype (slp_op, mask_vectype);
> +   gcc_assert (match_p);
> + }
> Add assertion here.
>
> However, recently an ICE suddenly appear today in RISC-V regression:
>
> FAIL: gcc.dg/tree-ssa/pr44306.c (internal compiler error: in 
> vectorizable_load, at tree-vect-stmts.cc:9885)
> FAIL: gcc.dg/tree-ssa/pr44306.c (test for excess errors)
>
> This is because we are encountering that mask_vectype is boolean type and it 
> is external def.
> Then vect_maybe_update_slp_op_vectype will return false.
>
> Then I fix this piece of code in V5 here:
>
> +      if (mask_index >= 0 && slp_node
> +   && !vect_maybe_update_slp_op_vectype (slp_op, mask_vectype))
> + {
> +   /* We don't vectorize the boolean type external SLP mask.  */
> +   if (dump_enabled_p ())
> +     dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +      "incompatible vector types for invariants\n");
> +   return false;
> + }
>
> Bootstrap and Regression on x86 passed.
 
Why are external defs a problem though?  E.g. in pr44306, it looks like
we should be able to create an invariant mask that contains
 
   (!present[0]) || UseDefaultScalingMatrix8x8Flag[0]
   (!present[1]) || UseDefaultScalingMatrix8x8Flag[1]
   (!present[0]) || UseDefaultScalingMatrix8x8Flag[0]
   (!present[1]) || UseDefaultScalingMatrix8x8Flag[1]
   ...repeating...
 
The type of the mask can be driven by the code that needs it.
 
Thanks,
Richard
 
>
> Thanks.
>
>
> juzhe.zh...@rivai.ai
>  
> From: Juzhe-Zhong
> Date: 2023-10-18 20:36
> To: gcc-patches
> CC: richard.sandiford; rguenther; Juzhe-Zhong
> Subject: [PATCH V5] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> This patch fixes this following FAILs in RISC-V regression:
>  
> FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump 
> vect "Loop contains only SLP stmts"
> FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP 
> stmts"
> FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump 
> vect "Loop contains only SLP stmts"
> FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP 
> stmts"
>  
> The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD.
>  
> We have 2 following situations of scalar recognized MASK_LEN_GATHER_LOAD:
>  
> 1. conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, zero, 
> condtional mask).
>    
>    This situation we just need to leverage the current MASK_GATHER_LOAD which 
> can achieve SLP MASK_LEN_GATHER_LOAD.
>  
> 2. un-conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, 
> zero, -1)
>    
>    Current SLP check will failed on dummy mask -1, so we relax the check in 
> tree-vect-slp.cc and allow it to be materialized.
>     
> Consider this following case:
>  
> void __attribute__((noipa))
> f (int *restrict y, int *restrict x, int *restrict indices, int n)
> {
>   for (int i = 0; i < n; ++i)
>     {
>       y[i * 2] = x[indices[i * 2]] + 1;
>       y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2;
>     }
> }
>  
> https://godbolt.org/z/WG3M3n7Mo
>  
> GCC unable to SLP using VEC_LOAD_LANES/VEC_STORE_LANES:
>  
> f:
>         ble     a3,zero,.L5
> .L3:
>         vsetvli a5,a3,e8,mf4,ta,ma
>         vsetvli zero,a5,e32,m1,ta,ma
>         vlseg2e32.v     v6,(a2)
>         vsetvli a4,zero,e64,m2,ta,ma
>         vsext.vf2       v2,v6
>         vsll.vi v2,v2,2
>         vsetvli zero,a5,e32,m1,ta,ma
>         vluxei64.v      v1,(a1),v2
>         vsetvli a4,zero,e64,m2,ta,ma
>         vsext.vf2       v2,v7
>         vsetvli zero,zero,e32,m1,ta,ma
>         vadd.vi v4,v1,1
>         vsetvli zero,zero,e64,m2,ta,ma
>         vsll.vi v2,v2,2
>         vsetvli zero,a5,e32,m1,ta,ma
>         vluxei64.v      v2,(a1),v2
>         vsetvli a4,zero,e32,m1,ta,ma
>         slli    a6,a5,3
>         vadd.vi v5,v2,2
>         sub     a3,a3,a5
>         vsetvli zero,a5,e32,m1,ta,ma
>         vsseg2e32.v     v4,(a0)
>         add     a2,a2,a6
>         add     a0,a0,a6
>         bne     a3,zero,.L3
> .L5:
>         ret
>  
> After this patch:
>  
> f:
> ble a3,zero,.L5
> li a5,1
> csrr t1,vlenb
> slli a5,a5,33
> srli a7,t1,2
> addi a5,a5,1
> slli a3,a3,1
> neg t3,a7
> vsetvli a4,zero,e64,m1,ta,ma
> vmv.v.x v4,a5
> .L3:
> minu a5,a3,a7
> vsetvli zero,a5,e32,m1,ta,ma
> vle32.v v1,0(a2)
> vsetvli a4,zero,e64,m2,ta,ma
> vsext.vf2 v2,v1
> vsll.vi v2,v2,2
> vsetvli zero,a5,e32,m1,ta,ma
> vluxei64.v v2,(a1),v2
> vsetvli a4,zero,e32,m1,ta,ma
> mv a6,a3
> vadd.vv v2,v2,v4
> vsetvli zero,a5,e32,m1,ta,ma
> vse32.v v2,0(a0)
> add a2,a2,t1
> add a0,a0,t1
> add a3,a3,t3
> bgtu a6,a7,.L3
> .L5:
> ret
>  
> Note that I found we are missing conditional mask gather_load SLP test, 
> Append a test for it in this patch.
>  
> Tested on RISC-V and Bootstrap && Regression on X86 passed.
>  
> Ok for trunk ?
>  
> gcc/ChangeLog:
>  
> * tree-vect-slp.cc (vect_get_operand_map): Add MASK_LEN_GATHER_LOAD.
> (vect_get_and_check_slp_defs): Ditto.
> (vect_build_slp_tree_1): Ditto.
> (vect_build_slp_tree_2): Ditto.
> * tree-vect-stmts.cc (vectorizable_load): Ditto.
>  
> gcc/testsuite/ChangeLog:
>  
> * gcc.dg/vect/vect-gather-6.c: New test.
>  
> ---
> gcc/testsuite/gcc.dg/vect/vect-gather-6.c | 15 +++++++++++++++
> gcc/tree-vect-slp.cc                      | 22 ++++++++++++++++++----
> gcc/tree-vect-stmts.cc                    | 12 +++++++++++-
> 3 files changed, 44 insertions(+), 5 deletions(-)
> create mode 100644 gcc/testsuite/gcc.dg/vect/vect-gather-6.c
>  
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-6.c 
> b/gcc/testsuite/gcc.dg/vect/vect-gather-6.c
> new file mode 100644
> index 00000000000..ff55f321854
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-gather-6.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +
> +void
> +f (int *restrict y, int *restrict x, int *restrict indices, int *restrict 
> cond, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      if (cond[i * 2])
> + y[i * 2] = x[indices[i * 2]] + 1;
> +      if (cond[i * 2 + 1])
> + y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2;
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" vect { target 
> vect_gather_load_ifn } } } */
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index d081999a763..146dba658a2 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -552,6 +552,7 @@ vect_get_operand_map (const gimple *stmt, unsigned char 
> swap = 0)
>     return arg1_map;
>   case IFN_MASK_GATHER_LOAD:
> +   case IFN_MASK_LEN_GATHER_LOAD:
>     return arg1_arg4_map;
>   case IFN_MASK_STORE:
> @@ -719,8 +720,7 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned 
> char swap,
> {
>   tree type = TREE_TYPE (oprnd);
>   dt = dts[i];
> -   if ((dt == vect_constant_def
> -        || dt == vect_external_def)
> +   if (dt == vect_external_def
>       && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
>       && (TREE_CODE (type) == BOOLEAN_TYPE
>   || !can_duplicate_and_interleave_p (vinfo, stmts.length (),
> @@ -732,6 +732,17 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned 
> char swap,
> "for variable-length SLP %T\n", oprnd);
>       return -1;
>     }
> +   if (dt == vect_constant_def
> +       && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
> +       && !can_duplicate_and_interleave_p (vinfo, stmts.length (), type))
> +     {
> +       if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> + "Build SLP failed: invalid type of def "
> + "for variable-length SLP %T\n",
> + oprnd);
> +       return -1;
> +     }
>   /* For the swapping logic below force vect_reduction_def
>      for the reduction op in a SLP reduction group.  */
> @@ -1096,7 +1107,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
> *swap,
>   if (cfn == CFN_MASK_LOAD
>       || cfn == CFN_GATHER_LOAD
> -       || cfn == CFN_MASK_GATHER_LOAD)
> +       || cfn == CFN_MASK_GATHER_LOAD
> +       || cfn == CFN_MASK_LEN_GATHER_LOAD)
>     ldst_p = true;
>   else if (cfn == CFN_MASK_STORE)
>     {
> @@ -1357,6 +1369,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
> *swap,
>   if (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info))
>       && rhs_code != CFN_GATHER_LOAD
>       && rhs_code != CFN_MASK_GATHER_LOAD
> +       && rhs_code != CFN_MASK_LEN_GATHER_LOAD
>       /* Not grouped loads are handled as externals for BB
> vectorization.  For loop vectorization we can handle
> splats the same we handle single element interleaving.  */
> @@ -1857,7 +1870,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
>        if (gcall *stmt = dyn_cast <gcall *> (stmt_info->stmt))
> gcc_assert (gimple_call_internal_p (stmt, IFN_MASK_LOAD)
>     || gimple_call_internal_p (stmt, IFN_GATHER_LOAD)
> -     || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD));
> +     || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD)
> +     || gimple_call_internal_p (stmt, IFN_MASK_LEN_GATHER_LOAD));
>        else
> {
>   *max_nunits = this_max_nunits;
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index e5ff44c25f1..0bc7beb1342 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -9878,12 +9878,22 @@ vectorizable_load (vec_info *vinfo,
> return false;
>        mask_index = internal_fn_mask_index (ifn);
> +      slp_tree slp_op = NULL;
>        if (mask_index >= 0 && slp_node)
> mask_index = vect_slp_child_index_for_operand (call, mask_index);
>        if (mask_index >= 0
>   && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
> -       &mask, NULL, &mask_dt, &mask_vectype))
> +       &mask, &slp_op, &mask_dt, &mask_vectype))
> return false;
> +      if (mask_index >= 0 && slp_node
> +   && !vect_maybe_update_slp_op_vectype (slp_op, mask_vectype))
> + {
> +   /* We don't vectorize the boolean type external SLP mask.  */
> +   if (dump_enabled_p ())
> +     dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +      "incompatible vector types for invariants\n");
> +   return false;
> + }
>      }
>    tree vectype = STMT_VINFO_VECTYPE (stmt_info);
 

Reply via email to