On Wed, Nov 02, 2011 at 01:10:14PM +0400, Kirill Yukhin wrote: > Actually I did not get the point. > If we have no src/masking, destination must be unchanged until gather > will write to it (at least partially) > If we have all 1's in mask, scr must not be changed at all. > So, nullification in intrinsics just useless. > Having such snippet: > (1) vmovdqa k(%rax,%rax), %ymm1 > (2) vmovaps %ymm0, %ymm6 > (3) vmovaps %ymm0, %ymm2 > (4) vmovdqa k+32(%rax,%rax), %ymm3 > (5) vgatherdps %ymm6, vf1(,%ymm1,4), %ymm2 > > Looks pretty strange. Which value has ymm0? If it has all zeroes, then > (1)-(5) is dead code, which may be just removed. > If contains all 1s then (2) s useless.
%ymm0 is all ones (this is code from the auto-vectorization). (2) is not useless, %ymm6 contains the mask, for auto-vectorization (3) is useless, it is there just because the current gather insn patterns always use the previous value of the destination register. Because if vgatherdps above doesn't segfault, the whole register will be overwritten, and if it does segfault, nothing anywhere says that the scalar code was supposed to be vectorized through vgatherdps and what the destination register should contain. My question was about the intrinsics. If user writes something like the proglet below, can he have any expectations on what will be the content of the destination register of the vgather* insn that crashed (e.g. if the segfault handler decides to skip the vpgather* insn and longjmps to the next insn)? Currently 0 would be put there, because avx2intrin.h uses there src { 0, 0 ... } and mask { -1, -1 ... }. #define _GNU_SOURCE #include <stdlib.h> #include <signal.h> #include <stdio.h> #include <stdint.h> #include <sys/ucontext.h> #include <x86intrin.h> __m256i a, b; long long c[3] = { 64, 65, 66 }; void segv (int signum, siginfo_t *info, void *ctx) { struct ucontext *uc = (struct ucontext *) ctx; gregset_t *gregs = &uc->uc_mcontext.gregs; unsigned char *eip = (unsigned char *)gregs[REG_RIP]; printf ("%x\n", eip); exit (0); } int main () { struct sigaction sa; sa.sa_sigaction = segv; sigemptyset (&sa.sa_mask); sa.sa_flags = SA_SIGINFO; if (sigaction (SIGSEGV, &sa, NULL) != 0) return 1; b = _mm256_set_epi64x ((uintptr_t) & c[0], (uintptr_t) & c[1], (uintptr_t) NULL, (uintptr_t) & c[2]); a = _mm256_i64gather_epi64 (NULL, b, 1); printf ("%lx %lx %lx %lx\n", ((long long *) &a)[0], ((long long *) &a)[1], ((long long *) &a)[2], ((long long *) &a)[3]); return 0; } BTW, sde doesn't seem to work here as documented for the insn, TID0: Read 0x42 = *(UINT64*)0x6009f0 TID0: Read 0x42 = *(UINT64*)0 TID0: Read 0x41 = *(UINT64*)0x6009e8 TID0: Read 0x40 = *(UINT64*)0x6009e0 TID0: INS 0x0000000000400523 vpgatherqq ymm0, qword ptr [rax+ymm1*1], ymm2 TID0: YMM0 := 00000000_00000040_00000000_00000041 _00000000_00000042_00000000_00000042 Or did I misunderstand the documentation and the insn isn't supposed to segfault? And, if user can't expect anything in the register because the intrinsics doesn't even have any src/mask arguments, what about if a = _mm256_i64gather_epi64 (NULL, b, 1); in the testcase is replaced with: __m256i d, e; d = _mm256_set_epi64x (1, 2, 3, 4); e = _mm256_set_epi64x (-1, -1, -1, -1); a = _mm256_mask_i64gather_epi64 (d, NULL, b, e, 1); Again, does the intrinsics (as opposed to hw insn) make any guarantees on what will be in the register after the segfault? Does the compiler have to load the destination of vpgather* insn register with the { 1LL, 2LL, 3LL, 4LL } vector before the insn or is it free to optimize that away as it can see the mask loads all values? Can you ask what ICC does here and what the intrinsics semantics should be? Jakub