https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66862
Jakub Jelinek <jakub at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jakub at gcc dot gnu.org, | |kyukhin at gcc dot gnu.org, | |uros at gcc dot gnu.org --- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> --- If you sed -i 's/short/int/' on the testcase, then e.g. with -mavx2 it is vectorized with vmaskmovd. But AVX2 does not have a masked store for packed 16-bit integers, and as Richard mentioned, using vpminuw/vmovdqu that icc emits is IMHO invalid, as it introduces a store data race and I see no wording in the OpenMP standard that would allow introducing store data races, even in omp simd regions. Now, it seems AVX512BW (and AVX512VL in some cases) has the needed instructions, in particular VMOVDQU{8,16}, but it is not reflected in maskload<mode> and maskstore<mode> expanders. CCing Kyrill and Uros on this.