I think it's vectorized by 128bit vector too.

  vector(4) int vect__9.9;
  vector(4) int vect__2.6;
  vector(4) int vect__1.5;
  int _1;
  int _5;
  int _11;
  int _13;
  vector(4) int _27;

  <bb 2> [local count: 1073741824]:
  vect__1.5_24 = MEM <vector(4) int> [(int *)&b];
  vect__2.6_25 = vect__1.5_24 + { 1, 2, 3, 4 };
  _1 = b[0];
  _5 = b[2];
  MEM <vector(4) int> [(int *)&a] = vect__2.6_25;
  _11 = b[4];
  _13 = b[6];
  _27 = {_1, _5, _11, _13};
  vect__9.9_28 = _27 * { 3, 4, 5, 7 };
  MEM <vector(4) int> [(int *)&a + 16B] = vect__9.9_28;


We can confirm it here: https://godbolt.org/z/6jGrEoz9s



juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2024-01-16 15:43
To: Juzhe-Zhong
CC: gcc-patches; pinskia
Subject: Re: [PATCH] test regression fix: Remove xfail for variable length 
targets of bb-slp-subgroups-3.c
On Tue, 16 Jan 2024, Juzhe-Zhong wrote:
 
> Notice there is a regression recently:
> XPASS: gcc.dg/vect/bb-slp-subgroups-3.c -flto -ffat-lto-objects  
> scan-tree-dump-times slp2 "optimized: basic block" 2
> XPASS: gcc.dg/vect/bb-slp-subgroups-3.c scan-tree-dump-times slp2 "optimized: 
> basic block" 2
> 
> Checked on both ARM SVE an RVV:
> 
> https://godbolt.org/z/jz4cYbqc8
> 
> "optimized: basic block" appears twice.
> 
> I guess ARM SVE has the same XPASS as RVV.
> 
> Hi, Andrew. Could you confirm about it ?
 
How does it vectorize it?  See the comments in the testcase.  The
intent was to check we can split the store and vectorize the
add and multiplication separately even when fed from the same
load group.  So ideally we'd add sth similar as in bb-slp-43.c,
looking for not "vector operands from scalars"
 
> gcc/testsuite/ChangeLog:
> 
> * gcc.dg/vect/bb-slp-subgroups-3.c: Remove XFAIL of variable length.
> 
> ---
>  gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c 
> b/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c
> index fb719915db7..3f0d45ce4a1 100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c
> @@ -42,7 +42,7 @@ main (int argc, char **argv)
>  /* Because we disable the cost model, targets with variable-length
>     vectors can end up vectorizing the store to a[0..7] on its own.
>     With the cost model we do something sensible.  */
> -/* { dg-final { scan-tree-dump-times "optimized: basic block" 2 "slp2" { 
> target { ! amdgcn-*-* } xfail vect_variable_length } } } */
> +/* { dg-final { scan-tree-dump-times "optimized: basic block" 2 "slp2" { 
> target { ! amdgcn-*-* } } } } */
>  
>  /* amdgcn can do this in one vector.  */
>  /* { dg-final { scan-tree-dump-times "optimized: basic block" 1 "slp2" { 
> target amdgcn-*-* } } } */
> 
 
-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
 

Reply via email to