https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92772
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- IIRC AVX512 also implements fully masked loops so the testcase should fail there, too, if we adjust N accordingly (to 15 or 31). Hmm, can't seem to trigger the fully masked support here, maybe I misremember. Btw, isn't the issue that the reduction looks at all lanes? That is, I think the code simply assumes that for fully masked loops at least one iteration is performed with all lanes active. So if you bump N to 64 + 32 the test passes on amdgcn?