And also I already decided to make remove WHILE_LEN pattern since it seems to 
be unnecessary.
And as Richard said, it's just a simple airthmetic and it's not worthwhile to 
do that.

So, I plan to replace WHILE_LEN into MIN_EXPR and make everything RVV specific 
done in RISC-V port.
I think it's more reasonable for IBM use and more target use in the future.

So, this patch will need to changed as "introduce a new flow to do 
vectorization loop control" which is a new loop control flow
with saturating subtracting n down to zero, and add a target hook for it so 
that we can switch to this flow ?

Is it more reasonable ?
Thanks.


juzhe.zh...@rivai.ai
 
From: Kewen.Lin
Date: 2023-04-14 10:54
To: 钟居哲
CC: gcc-patches; Jeff Law; rdapp; richard.sandiford; rguenther
Subject: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for 
auto-vectorization
Hi Juzhe,
 
on 2023/4/13 21:44, 钟居哲 wrote:
> Thanks Kewen.
> 
> Current flow in this patch like you said:
> ....
> len = WHILE_LEN (n,vf);
> ...
> v = len_load (addr,len);
> ..
> addr = addr + vf (in byte align);
> ....
> 
> This patch is just keep adding address with a vector factor (adjust as byte 
> align).
> For example, if your vector length = 512bit. Then this patch is just updating 
> address as
> addr = addr + 64;
> 
> However, today after I read RVV ISA more deeply, it should be more 
> appropriate that
> the address should updated as : addr = addr + (len * 4) if len is element 
> number of INT32.
> the len is the result by WHILE_LEN which calculate the len.
 
I just read your detailed explanation on the usage of vsetvli insn (really 
appreciate that),
it looks that this WHILE_LEN wants some more semantics than MIN, so I assume 
you still want
to introduce this WHILE_LEN.
 
> 
> I assume for IBM target, it's better to just update address directly adding 
> the whole register bytesize 
> in address IV. Since I think the second way (address = addr + (len * 4)) is 
> too RVV specific, and won't be suitable for IBM. Is that right?
 
Yes, we just wants to add the whole vector register length in bytes.
 
> If it is true, I will keep this patch flow (won't change to  address = addr + 
> (len * 4)) to see what else I need to do for IBM.
> I would rather do that in RISC-V backend port.
 
IMHO, you don't need to push this down to RV backend, just query these ports 
having len_{load,store}
support with a target hook or special operand in optab while_len (see 
internal_len_load_store_bias)
for this need, and generate different codes accordingly.  IIUC, for WHILE_LEN, 
you want it to have
the semantics as what vsetvli performs, but for IBM ports, it would be just 
like MIN_EXPR, maybe we
can also generate MIN or WHILE_LEN based on this kind of target information.
 
If the above assumption holds, I wonder if you also want WHILE_LEN to have the 
implicit effect
to update vector length register?  If yes, the codes with multiple rgroups 
looks unexpected:
 
+ _76 = .WHILE_LEN (ivtmp_74, vf * nitems_per_ctrl);
+ _79 = .WHILE_LEN (ivtmp_77, vf * nitems_per_ctrl);
 
as the latter one seems to override the former.  Besides, if the given operands 
are known constants,
it can't directly be folded into constants and do further propagation.   From 
this perspective, Richi's
suggestion on "tieing the scalar result with the uses" looks better IMHO.
 
> 
>>> I tried
>>>to compile the above source files on Power, the former can adopt doloop
>>>optimization but the latter fails to. 
> You mean GCC can not do hardward loop optimization when IV loop control is 
> variable ? 
 
No, for both cases, IV is variable, the dumping at loop2_doloop for the 
proposed sequence says
"Doloop: Possible infinite iteration case.", it seems to show that for the 
proposed sequence compiler 
isn't able to figure out the loop is finite, it may miss the range information 
on n, or it isn't
able to analyze how the invariant involves, but I didn't look into it, all my 
guesses.
 
BR,
Kewen
 

Reply via email to