https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104112
--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> --- Oh, and we're also not verifying /* The target has to make sure we support lowpart/highpart extraction, either via direct vector extract or through an integer mode punning. */ and an alternative would be to do the reduction in the wider mode and only do a final lowpart extraction, but that would require the support of some intermediate const permutes so we get { 0, 1, 2, 3, 4, 5, 6, 7 } + { 4, 5, 6, 7, /* dont-care */ } + { 2+6, 3+7, /* dont-care */ } basically whole-vector shifts by half and a quater of the vector for one missing intermediate mode and then the appropriate lowpart of the final vector mode. code-gen wise with the proposed patch we get no accumulator re-use while with the above scheme we might be able to re-use it (not sure if SVE is capable of that or whether that would be profitable). Without -msve-vector-bits=512 we simply get variable length vector code. There doesn't seem to be -msve-vector-bits=512,256 or so to enable both lengths (the compiler could set a static mask to "emulate" 256 with fixed 512 vectors?)