"juzhe.zh...@rivai.ai" <juzhe.zh...@rivai.ai> writes:
>>> I think that already works for them (could be misremembering).
>>> However, IIUC, they have no special instruction to calculate the
>>> length (unlike for RVV), and so it's open-coded using vect_get_len.
>
> Yeah, the current flow using min, sub, and then min in vect_get_len
> is working for IBM. But I wonder whether switching the current flow of
> length-loop-control into the WHILE_LEN pattern that this patch can improve
> their performance.
>
>>> (1) How easy would it be to express WHILE_LEN in normal gimple?
>>>     I haven't thought about this at all, so the answer might be
>>>     "very hard".  But it reminds me a little of UQDEC on AArch64,
>>>     which we open-code using MAX_EXPR and MINUS_EXPR (see
>  >>    vect_set_loop_controls_directly).
>
>   >>   I'm not saying WHILE_LEN is the same operation, just that it seems
>   >>   like it might be open-codeable in a similar way.
>
>  >>    Even if we can open-code it, we'd still need some way for the
>   >>   target to select the "RVV way" from the "s390/PowerPC way".
>
> WHILE_LEN in doc I define is
> operand0 = MIN (operand1, operand2)operand1 is the residual number of scalar 
> elements need to be updated.operand2 is vectorization factor (vf) for single 
> rgroup.         if multiple rgroup operan2 = vf * nitems_per_ctrl.You mean 
> such pattern is not well expressed so we need to replace it with normaltree 
> code (MIN OR MAX). And let RISC-V backend to optimize them into vsetvl 
> ?Sorry, maybe I am not on the same page.

It's not so much that we need to do that.  But normally it's only worth
adding internal functions if they do something that is too complicated
to express in simple gimple arithmetic.  The UQDEC case I mentioned:

   z = MAX (x, y) - y

fell into the "simple arithmetic" category for me.  We could have added
an ifn for unsigned saturating decrement, but it didn't seem complicated
enough to merit its own ifn.

>>> (2) What effect does using a variable IV step (the result of
>>> the WHILE_LEN) have on ivopts?  I remember experimenting with
>>> something similar once (can't remember the context) and not
>>> having a constant step prevented ivopts from making good
>>> addresing-mode choices.
>
> Thank you so much for pointing out this. Currently, varialble IV step and 
> decreasing n down to 0 
> works fine for RISC-V downstream GCC and we didn't find issues related 
> addressing-mode choosing.

OK, that's good.  Sounds like it isn't a problem then.

> I think I must missed something, would you mind giving me some hints so that 
> I can study on ivopts
> to find out which case may generate inferior codegens for varialble IV step?

I think AArch64 was sensitive to this because (a) the vectoriser creates
separate IVs for each base address and (b) for SVE, we instead want
invariant base addresses that are indexed by the loop control IV.
Like Richard says, if the loop control IV isn't a SCEV, ivopts isn't
able to use it and so (b) fails.

Thanks,
Richard

Reply via email to