https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767
Matthias Kretz <kretz at kde dot org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |kretz at kde dot org --- Comment #3 from Matthias Kretz <kretz at kde dot org> --- Here's a link to quickly survey the state of the optimization: https://godbolt.org/z/P8FFSB The broadcast contraction generally exists for 32 and 64 bit operands (i.e. 8 and 16 bit integers do not support memory broadcasts). Note that I also added 8 Byte vectors of float and int into my test case. Those should also use the {1to4} broadcast. Test case (C++14): auto f(float a [[gnu::vector_size(64)]]) { return a OP 101; } auto f(float a [[gnu::vector_size(32)]]) { return a OP 101; } auto f(float a [[gnu::vector_size(16)]]) { return a OP 101; } auto f(float a [[gnu::vector_size(8)]]) { return a OP 101; } auto f(double a [[gnu::vector_size(64)]]) { return a OP 101; } auto f(double a [[gnu::vector_size(32)]]) { return a OP 101; } auto f(double a [[gnu::vector_size(16)]]) { return a OP 101; } auto f(int a [[gnu::vector_size(64)]]) { return a OP 101; } auto f(int a [[gnu::vector_size(32)]]) { return a OP 101; } auto f(int a [[gnu::vector_size(16)]]) { return a OP 101; } auto f(int a [[gnu::vector_size(8)]]) { return a OP 101; } auto f(long long a [[gnu::vector_size(64)]]) { return a OP 101; } auto f(long long a [[gnu::vector_size(32)]]) { return a OP 101; } auto f(long long a [[gnu::vector_size(16)]]) { return a OP 101; }