https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121523

--- Comment #5 from Pengfei Li <pfustc at gcc dot gnu.org> ---
(In reply to Robin Dapp from comment #4)
> 
> OK, I'll post this as a patch after some testing.  I was just speculating
> you'd have any insight why we're seeing this on riscv but not on aarch64. 
> I'd have expected aarch64 uses the same code path but probably
> length-control changes things slightly.
> 
> Anyway the -1 value for bound doesn't make sense the way it's used any way,
> so I'll go ahead.

Hi Robin,

Apologize that it breaks RISC-V. Here are some insights I can share:

1) AArch64 doesn't vectorize Anton's minimal loop in VLA modes. Current VLA
peeling on AArch64 only uses "peeling with masking" - executing loop's first
iteration with partial vectors to avoid scalar peels. But since the loop mixes
64-bit pointers and 8-bit chars, it has "ncopies > 1" so partial vectors for
"peeling with masking" cannot be used. The loop is vectorized with AdvSIMD
instead of SVE VLA in the end.

2) So far, I haven't reproduced the ICE on AArch64 with other loops. I tried
modifying Anton's loop to below by changing "char *b" to "long *b" so it can
get vectorized with SVE VLA.

int a(void)
{
        long *b;
        while (b < a && *b)
                b++;
        if (b >= a)
                return 0;
}

But I still don't get an ICE on AArch64 with this, as the call to
scale_loop_profile() is skipped because prolog_peeling == 0 on AArch64.

3) I cross-built a RISC-V cc1 and did reproduce the ICE with my modified case
above (the one with long*). What I find is that the code paths of AArch64 and
RISC-V diverge at below point of function "vect_do_peeling()":

if (!vect_use_loop_mask_for_alignment_p (loop_vinfo))
  prolog_peeling = LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo);

AArch64 returns true for vect_use_loop_mask_for_alignment_p (loop_vinfo), so
prolog_peeling stays 0 and the rest of prolog peeling logic is skipped. But on
RISC-V, loop_vino->masks is empty, so scale_loop_profile() with BOUND = -1 is
hit.

4) I see the same problem as in your investigation that we use "bound - 1 = -2"
as scale factor for the loop profile, which causes the ICE. Due to my limited
knowledge on RISC-V and the length-control vectorization, I cannot give much
suggestion about the fix at the moment. I don't know whether RVV uses similar
ways of executing loop's first iteration with partial vectors or just scalar
peeling for VLA. Setting bound to 0 looks straightforward but I'm not sure if
it has side effects. (If it's used for computing the cost, should we set it to
an estimated value instead if the number of iterations to peel is unknown?)

Reply via email to