https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122297

Robin Dapp <rdapp at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rdapp at gcc dot gnu.org

--- Comment #9 from Robin Dapp <rdapp at gcc dot gnu.org> ---
I think the anamnesis is that when length-control was introduced we only had
loads/stores with length so the record_loop_len for those were correct.  Back
then there was no support for other loads (like lanes) nor for
vectorizable_live_operation, as we didn't even have vec_extract with variable
index.  When adjusted for the length-control those record_loop_len didn't get
the factor right (as riscv measures in lanes/elements).

Therefore, the "factor" here (get_loop_len, not record_loop_len) was implicitly
always the same.  With vec_extract we want to count in lanes rather than bytes
so obviously there is a mismatch.  And if vectorizable_live_operation's
record_loop_len is the only one that ever gets called we will record an
rgl-factor of 1.  So I'd say dividing by rgl->factor is ok in principle if the
len is used for an extract/extract_last etc.  Another thing when it is used
that way is that we don't need the adjusted length?

So basically we need the unbiased and unscaled element length for e.g.
vec_extract but the potentially biased and scaled length for len_load/store.
If we get rid of factor altogether we'd need to re-do the scaling and biasing
right before the load/store and hope to PRE/CSE those statements so we're not
worse overall.

It would also be nice not having to divide to re-use a len...

A lot of words to basically say the same thing as Richi ;)

I think I'd lean towards having get_loop_len do things but add an argument that
indicates that the user wants the original (unbiased, unscaled) length.
Or have this be the default and add a flag for the scaled, biased length. 

Regarding

  if (rgl->factor == 1 && factor == 1)

IIRC this (as well as "factor", not rgl->factor, in the first place) was put
into place for riscv where the factor is always 1 and multiple rgroups are
possible.  For s390 this shouldn't happen as we disallow more than 1 rgroup? 
On power it could but I guess we don't support the instructions that would
register a factor of 1?  And I do seem to remember that length control was very
verbose on power.

(As a side remark, and despite being documented differently, I wouldn't call
the byte approach a fallback, it's as valid as the lane-counting approach and
we just happen to try a byte len mode second.)

Reply via email to