On Sat, Oct 8, 2011 at 5:43 PM, Jakub Jelinek <ja...@redhat.com> wrote:
> The AVX2 docs say that the insns will #UD if any of the mask, src and index > registers are the same, but e.g. on > #include <x86intrin.h> > > __m256 m; > float f[1024]; > > __m256 > foo (void) > { > __m256i mi = (__m256i) m; > return _mm256_mask_i32gather_ps (m, f, mi, m, 4); > } > > which is IMHO valid and should for m being zero vector just return a > zero vector and clear mask (in this case it was already cleared) we compile > it as > vmovdqa m(%rip), %ymm1 > vmovaps %ymm1, %ymm0 > vgatherdps %ymm1, (%rax, %ymm1, 4), %ymm0 > and thus IMHO it will #UD. Also, the insns should make it clear that > the mask register is modified too (the patch clobbers it, perhaps > we could instead say that it zeros the register (which is true if > it doesn't segfault), but then what if a segfault handler chooses to > continue with the next insn and doesn't clear the mask register?). > Still, the insn description is imprecise, saying that it loads from mem > at the address register is wrong and perhaps some DCE might delete > what shouldn't be deleted. So, either it should (use (mem (scratch))) > or something similar, or in the unspec list all the memory locations > that are being read > (mem:<scalarssemode> (plus:SI (reg:SI) (vec_select:SI (match_operand:V4SI) > (parallel [(const_int N)])))) > for N 0 through something (but it is complicated by Pmode size vs. > the need to do nothing/truncate/sign_extend the vec_select to the right > mode). > > What do you think? Regarding the clear of mask operand: I agree that this should be modelled as a clobber. Zeroing can't be guaranteed due to the fact you described above. About memory - can't we use (mem:BLK (match_operand:P "register_operand" "r")) here? BTW: No need to use %c modifier: /* Meaning of CODE: L,W,B,Q,S,T -- print the opcode suffix for specified size of operand. C -- print opcode suffix for set/cmov insn. c -- like C, but print reversed condition ... */ Uros.