Hi Juzhe,
on 2023/4/12 21:22, 钟居哲 wrote:
> Thanks Kewen.
>
> It seems that this proposal WHILE_LEN can help s390 when using --param
> vect-partial-vector-usage=2 compile option.
>
Yeah, IMHO, the previous sequence vs. the proposed sequence are like:
int
foo (int *__restrict a, int *__restrict b, int n)
{
if (n <= 0)
return 0;
int iv = 0;
int len = MIN (n, 16);
int sum = 0;
do
{
sum += a[len] + b[len];
iv += 16;
int n1 = MIN (n, iv); // line A
int n2 = n - n1;
len = MIN (n2, 16);
}
while (n > iv);
return sum;
}
vs.
int
foo (int *__restrict a, int *__restrict b, int n)
{
if (n <= 0)
return 0;
int len;
int sum = 0;
do
{
len = MIN (n, 16);
sum += a[len] + b[len];
n -= len;
}
while (n > 0);
return sum;
}
it at least saves one MIN (at line A) and one length preparation in the
last iteration (it's useless since loop ends). But I think the concern
that this proposed IV isn't recognized as simple iv may stay. I tried
to compile the above source files on Power, the former can adopt doloop
optimization but the latter fails to.
> Would you mind apply this patch && support WHILE_LEN in s390 backend and test
> it to see the overal benefits for s390
> as well as the correctness of this sequence ?
Sure, if all of you think this approach and this revision is good enough to go
forward for this kind of evaluation,
I'm happy to give it a shot, but only for rs6000. ;-) I noticed that there are
some discussions on withdrawing this
WHILE_LEN by using MIN_EXPR instead, I'll stay tuned.
btw, now we only adopt vector with length on the epilogues rather than the main
vectorized loops, because of the
non-trivial extra costs for length preparation than just using the normal
vector load/store (all lanes), so we don't
care about the performance with --param vect-partial-vector-usage=2 much. Even
if this new proposal can optimize
the length preparation for --param vect-partial-vector-usage=2, the extra costs
for length preparation is still
unavoidable (MIN, shifting, one more GPR used), we would still stay with
default --param vect-partial-vector-usage=1
(which can't benefit from this new proposal).
BR,
Kewen