https://bugs.kde.org/show_bug.cgi?id=409429

            Bug ID: 409429
           Summary: False positives at unexpected location due to failure
                    to recognize cmpeq as a dependency breaking idiom
           Product: valgrind
           Version: 3.11.0
          Platform: Ubuntu Packages
                OS: Linux
            Status: REPORTED
          Severity: normal
          Priority: NOR
         Component: memcheck
          Assignee: jsew...@acm.org
          Reporter: travis.do...@gmail.com
  Target Milestone: ---

Consider the following reproduction case:

#include <immintrin.h>

__attribute__((noinline))
int gather(const int *array, void *p, void *p2) {
    __m256i ret = _mm256_i32gather_epi32((const int*)array,
_mm256_setzero_si256(), 4);
    return ret[0];
}

struct bunch_of_chars {
    char stuff[128];
};

int main() {
    int* array = new int[8]{};
    bunch_of_chars x, y = x;
    return gather(array, &x, &y);
}

Here, some copying of uninitialized value happens at y = x. Neither x nor y are
used, so this doesn't usually trigger a warning in Valgrind (and indeed such
copying of uninitalized values w/o using them is common so avoiding a false
positive here is important).

Then, however, we call gather() function, only only operates on
zero-initialized array using a gather instruction. We only pass pointers to x
and y to avoid the compiler optimizing them away completely (you could do it in
other ways too, e.g., assigning them to volatile: in the real example this was
derived from some parts of x and y were later used).

We get a valgrind failure as follows:

==13033== Memcheck, a memory error detector
==13033== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==13033== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==13033== Command: ./bench
==13033== 
==13033== Use of uninitialised value of size 8
==13033==    at 0x400718: _mm256_i32gather_epi32 (avx2intrin.h:1598)
==13033==    by 0x400718: gather(int const*, void*, void*) (main.cpp:6)
==13033==    by 0x400608: main (in /home/tdowns/dev/sorted-intersections/bench)

Here's the assembly for gather function:

0000000000400710 <gather(int const*, void*, void*)>:
  400710:       vpxor  xmm1,xmm1,xmm1
  400714:       vpcmpeqd ymm2,ymm2,ymm2
  400718:       vpgatherdd ymm0,DWORD PTR [rdi+ymm1*4],ymm2
  40071e:       vmovq  rax,xmm0
  400723:       vzeroupper 
  400726:       ret    
  400727:       nop    WORD PTR [rax+rax*1+0x0]

Here's an excerpt of the assembly leading up to the main function:

  40057f:       vmovdqa xmm0,XMMWORD PTR [rsp]
  400584:       vmovdqa xmm1,XMMWORD PTR [rsp+0x10]
  40058a:       mov    rsi,rsp
  40058d:       lea    rdx,[rsp+0x80]
  400595:       vmovdqa xmm2,XMMWORD PTR [rsp+0x20]
  40059b:       vmovdqa xmm3,XMMWORD PTR [rsp+0x30]
  4005a1:       mov    rdi,rax
  4005a4:       vmovdqa xmm4,XMMWORD PTR [rsp+0x40]
  4005aa:       vmovdqa xmm5,XMMWORD PTR [rsp+0x50]
  4005b0:       vmovaps XMMWORD PTR [rsp+0x80],xmm0
  4005b9:       vmovdqa xmm6,XMMWORD PTR [rsp+0x60]
  4005bf:       vmovdqa xmm7,XMMWORD PTR [rsp+0x70]
  4005c5:       vmovaps XMMWORD PTR [rsp+0x90],xmm1
  4005ce:       vmovaps XMMWORD PTR [rsp+0xa0],xmm2
  4005d7:       vmovaps XMMWORD PTR [rsp+0xb0],xmm3
  4005e0:       vmovaps XMMWORD PTR [rsp+0xc0],xmm4
  4005e9:       vmovaps XMMWORD PTR [rsp+0xd0],xmm5
  4005f2:       vmovaps XMMWORD PTR [rsp+0xe0],xmm6
  4005fb:       vmovaps XMMWORD PTR [rsp+0xf0],xmm7
  400604:       call   400710 <gather(int const*, void*, void*)>

We can see the uninitialized copy happening (all the movaps calls). Note that
at this point, the xmm2 register is "tainted" since it contains uninitialized
value. This is what triggers the false positive: the vgatherdd instruction uses
ymm2 as input (the mask register). However, ymm2 was already set to "all ones"
by the earlier 'vpcmpeqd ymm2,ymm2,ymm2' instruction, so its value is no longer
tainted and has no relationship to the earlier xmm2 value.

You can show that ymm2 is causing the problem by using inline assembly to
insert a clearing at the start of gather:

asm volatile ("vpxor %xmm2, %xmm2, %xmm2");

This has no effect on the logic, but now Valgrind is happy.

Valgrind should understand vpcmpeq* with the same register for both inputs as
independent of the previous value as this is common.

This issue is very hard to track down because the false positive occurs at an
unrelated place, potentially very far away from the original source since the
xmm registers may go a long time w/o being used.


SOFTWARE/OS VERSIONS
Linux: Ubuntu 16.04
Valgrind: 3.11
g++: 8.1.0

compiled with:

g++-8 -DNDEBUG -Wall -Wextra -O2 -g -march=haswell -Wno-unused-parameter 
-Wno-error=unused-variable -Wno-unknown-pragmas -std=c++14  -c -o main.o
main.cpp

although really it's -O2 that matters (so you get the vmovaps-based copy in
main.

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to