https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66852
Bug ID: 66852 Summary: vmovdqa instructions issued on 64-bit aligned array, causes segfault Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: noloader at gmail dot com Target Milestone: --- My apologies for *not* having a minimum working example. Sometimes its hard to craft them, and this is one of those times. The C++ code below causes a segfault on [relative] line 10 when using GCC 4.9/x86_64 with -O3. Line 10 is: ((word64*)buf)[i] ^= ((word64*)mask)[i]; >From a disassembly, here's the offending code: 0x0000000000539fae <+206>: vmovdqu (%rcx,%r10,1),%xmm0 0x0000000000539fb4 <+212>: vinsertf128 $0x1,0x10(%rcx,%r10,1),%ymm0,%ymm0 0x0000000000539fbc <+220>: vxorps 0x0(%r13,%r10,1),%ymm0,%ymm0 => 0x0000000000539fc3 <+227>: vmovdqa %ymm0,0x0(%r13,%r10,1) Looking at vmovdqa requirements, it appears it requires 128-bit aligned words. However, the array starts as a 'byte*' (unsigned char) and then is cast depending on the alignment. In this case, its cast to a 64-bit word pointer. Here is how word64 is defined: #if defined(_MSC_VER) || defined(__BORLANDC__) typedef unsigned __int64 word64; #define W64LIT(x) x##ui64 #else typedef unsigned long long word64; #define W64LIT(x) x##ULL #endif ********** One system: $ g++ --version g++ (Debian 4.9.2-10) 4.9.2 $ uname -a Linux debian-8-x64 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1 (2015-05-24) x86_64 GNU/Linux ********** Same problem, another system: $ g++ --version g++ (GCC) 4.9.2 20150212 (Red Hat 4.9.2-6) $ uname -a Linux localhost.localdomain 4.0.6-200.fc21.x86_64 #1 SMP Tue Jun 23 13:59:12 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux ********** I am able to tame the problem with the following, so I guess its a potential work around (though I'd be happy to get other suggestions): #pragma GCC optimize push #pragma GCC optimize ("-O2") void xorbuf(byte *buf, const byte *mask, size_t count) { ... } #pragma GCC optimize pop ********** void xorbuf(byte *buf, const byte *mask, size_t count) { size_t i; if (IsAligned<word32>(buf) && IsAligned<word32>(mask)) { if (!CRYPTOPP_BOOL_SLOW_WORD64 && IsAligned<word64>(buf) && IsAligned<word64>(mask)) { for (i=0; i<count/8; i++) ((word64*)buf)[i] ^= ((word64*)mask)[i]; count -= 8*i; if (!count) return; buf += 8*i; mask += 8*i; } for (i=0; i<count/4; i++) ((word32*)buf)[i] ^= ((word32*)mask)[i]; count -= 4*i; if (!count) return; buf += 4*i; mask += 4*i; } for (i=0; i<count; i++) buf[i] ^= mask[i]; }