https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908

--- Comment #47 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 29 Mar 2022, crazylht at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
> 
> --- Comment #46 from Hongtao.liu <crazylht at gmail dot com> ---
> Another issue is splitting vector load to halves or elements, the latter
> requires scratch registers which may not be available, the former doesn't
> require extra register but may still trigger STLF stalls. For cray case,
> splitting to halves is equal to splitting to elements.
> 
> For x86, there're sse/256_unaligned_load_optima would split 128/256-bit vector
> load to halves.

I suggest to try the easy case first, only split when splitting would
split to elements and when that doesn't require scratch registers.
For large N (number of elements) the separate loads + inserts will
eventually offset the penalty of a failing forwarding anyway, so it
is less obviously a win (or less obviously not a loss).

Reply via email to