https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66852

            Bug ID: 66852
           Summary: vmovdqa instructions issued on 64-bit aligned array,
                    causes segfault
           Product: gcc
           Version: 4.9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: noloader at gmail dot com
  Target Milestone: ---

My apologies for *not* having a minimum working example. Sometimes its hard to
craft them, and this is one of those times.

The C++ code below causes a segfault on [relative] line 10 when using GCC
4.9/x86_64 with -O3. Line 10 is:

    ((word64*)buf)[i] ^= ((word64*)mask)[i];

>From a disassembly, here's the offending code:

   0x0000000000539fae <+206>:   vmovdqu (%rcx,%r10,1),%xmm0
   0x0000000000539fb4 <+212>:   vinsertf128 $0x1,0x10(%rcx,%r10,1),%ymm0,%ymm0
   0x0000000000539fbc <+220>:   vxorps 0x0(%r13,%r10,1),%ymm0,%ymm0
=> 0x0000000000539fc3 <+227>:   vmovdqa %ymm0,0x0(%r13,%r10,1)

Looking at vmovdqa requirements, it appears it requires 128-bit aligned words.
However, the array starts as a 'byte*' (unsigned char) and then is cast
depending on the alignment.

In this case, its cast to a 64-bit word pointer. Here is how word64 is defined:

    #if defined(_MSC_VER) || defined(__BORLANDC__)
        typedef unsigned __int64 word64;
        #define W64LIT(x) x##ui64
    #else
        typedef unsigned long long word64;
        #define W64LIT(x) x##ULL
    #endif

**********

One system:

    $ g++ --version
    g++ (Debian 4.9.2-10) 4.9.2

    $ uname -a
    Linux debian-8-x64 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1 (2015-05-24)
x86_64 GNU/Linux

**********

Same problem, another system:

    $ g++ --version
    g++ (GCC) 4.9.2 20150212 (Red Hat 4.9.2-6)

    $ uname -a
    Linux localhost.localdomain 4.0.6-200.fc21.x86_64 #1 SMP Tue Jun 23
13:59:12 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

**********

I am able to tame the problem with the following, so I guess its a potential
work around (though I'd be happy to get other suggestions):

#pragma GCC optimize push
#pragma GCC optimize ("-O2")

void xorbuf(byte *buf, const byte *mask, size_t count)
{
    ...
}

#pragma GCC optimize pop

**********

void xorbuf(byte *buf, const byte *mask, size_t count)
{
    size_t i;

    if (IsAligned<word32>(buf) && IsAligned<word32>(mask))
    {
        if (!CRYPTOPP_BOOL_SLOW_WORD64 && IsAligned<word64>(buf) &&
IsAligned<word64>(mask))
        {
            for (i=0; i<count/8; i++)
                ((word64*)buf)[i] ^= ((word64*)mask)[i];
            count -= 8*i;
            if (!count)
                return;
            buf += 8*i;
            mask += 8*i;
        }

        for (i=0; i<count/4; i++)
            ((word32*)buf)[i] ^= ((word32*)mask)[i];
        count -= 4*i;
        if (!count)
            return;
        buf += 4*i;
        mask += 4*i;
    }

    for (i=0; i<count; i++)
        buf[i] ^= mask[i];
}

Reply via email to