https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110430

            Bug ID: 110430
           Summary: Fail to CSE for LEN_MASK_STORE
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: juzhe.zhong at rivai dot ai
  Target Milestone: ---

Consider this following case:

void __attribute__((noinline,noclone))
foo (int *out, int *res)
{
  int mask[] = { 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1 };
  int i;
  for (i = 0; i < 16; ++i)
    {
      if (mask[i])
        out[i] = i;
    }
  int o0 = out[0];
  int o7 = out[7];
  int o14 = out[14];
  int o15 = out[15];
  res[0] = o0;
  res[2] = o7;
  res[4] = o14;
  res[6] = o15;
}

-O3 -march=rv64gcv_zvl512b --param riscv-autovec-preference=fixed-vlmax
Current RVV auto-vectorization codegen:

foo:
        lui     a5,%hi(.LANCHOR0)
        vsetivli        zero,16,e32,m1,ta,ma
        addi    a5,a5,%lo(.LANCHOR0)
        vid.v   v1
        vlm.v   v0,0(a5)
        vsetvli a5,zero,e32,m1,ta,ma
        vse32.v v1,0(a0),v0.t
        lw      a2,0(a0)
        lw      a3,28(a0)
        lw      a4,56(a0)
        lw      a5,60(a0)
        sw      a2,0(a1)
        sw      a3,8(a1)
        sw      a4,16(a1)
        sw      a5,24(a1)
        ret

However, with this patch:
https://patchwork.sourceware.org/project/gcc/patch/20230627064737.16257-1-juzhe.zh...@rivai.ai/

We will end up with better codegen with CSE:

foo:
        lui     a5,%hi(.LANCHOR0)
        vsetivli        zero,16,e32,m1,ta,ma
        addi    a5,a5,%lo(.LANCHOR0)
        vid.v   v1
        vlm.v   v0,0(a5)
        vsetvli a5,zero,e32,m1,ta,ma
        vse32.v v1,0(a0),v0.t
        lw      a4,0(a0)
        lw      a5,56(a0)
        sw      a4,0(a1)
        sw      a5,16(a1)
        li      a4,7
        li      a5,15
        sw      a4,8(a1)
        sw      a5,24(a1)
        ret

2 "lw" should be CSE into 2 "li" instructions, gimple IR:

.LEN_MASK_STORE (out_10(D), 32B, 16, { 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0,
-1, 0, -1, 0, -1 }, { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 },
0);
  o0_11 = *out_10(D);
  o14_13 = MEM[(int *)out_10(D) + 56B];
  *res_15(D) = o0_11;
  MEM[(int *)res_15(D) + 8B] = 7;
  MEM[(int *)res_15(D) + 16B] = o14_13;
  MEM[(int *)res_15(D) + 24B] = 15;
  mask ={v} {CLOBBER(eol)};

Since after discussion with Richi, 
this current possible fix patch can only hanlde VLS (fixed-length) vectors,
can not handle VLA (variable-length) vectors.

It's hard for us to create a C code testcase to produce CSE opportunity for
VL vectors.

So, open a BUG for now to make me won't forget such issue.
Will enhance LEN_MASK_STORE in CSE after I finished all RVV auto-vectorization
support.

Reply via email to