We recently debugged, and found a workaround for, a GCC [###version] 
code-generation error when compiling OpenSSL 3.0.8 for 32-bit on Intel x86. 
This error resulted in a use of a misaligned memory operand with a 
packed-quadword instruction, producing a SIGSEGV on RedHat 8. (I'm a bit 
surprised Linux doesn't raise SIGBUS for this particular trap, but whatever.) I 
wanted to document this here in case other people run into it.

Aside: This does raise the question: Why aren't other people running into it? 
And why are we only seeing it now? Honestly, I don't know. It is sensitive to 
stack layout, but in some of our tests we could reproduce it consistently. It's 
possible you'll never see this in a program where the path into the sensitive 
functions in gcm128.c, which appear to be CRYPTO_gcm128_aad, 
CRYPTO_gcm128_encrypt, and CRYPTO_gcm128_decrypt, is made up completely of code 
compiled with GCC. In our case we have non-GCC code along that path in some 
cases, and that non-GCC code does not follow GCC's rather arbitrary stack-frame 
alignment rules for x86, so GCC may be making an invalid assumption about 
callers further up the stack and how they'll pad and align stack frames.

(It's known that with default build flags and optimization, GCC requires that 
callers align *parameters* strictly, because it may generate SSE code for 
operations on 64-bit and larger operations. But the problem here isn't a 
parameter, as I'll show in a moment.)

Anyway, back to the issue.

The affected functions declare a 64-bit integer object with automatic storage 
class:

    u64 alen = ctx->len.u[0];

and then operate on it:

    alen += len;

GCC, under appropriate conditions, generates code that performs a 
packed-quadword operation (specifically a PADDQ) with alen as the destination. 
That requires alen have 64-bit alignment. However, the generated code puts alen 
on a 32-bit boundary; examining its address before the trap occurs confirms it 
ends with 0x8.

The fix we're using is to add -mstackrealign to the build flags for OpenSSL on 
GCC x86 platforms. That adds prologue code to each function which checks the 
stack alignment at runtime and fixes it if necessary. Unfortunately this does 
mean some performance cost, obviously, which we have not yet tried to measure.

After quite a bit of investigation, we're fairly confident we'd call this a GCC 
bug. It looks like a consequence of the "fix" for GCC bug 65105, which was made 
a couple of years ago, to use XMM registers in 32-bit generated code on x86. 
GCC has an unfortunate history of assuming stronger stack-alignment rules on 
this platform than are required by the ISA or enforced by other languages and 
compilers, and some members of the GCC team are a bit notorious for their ... 
enthusiasm ... in justifying this position.

We have not yet attempted to raise this as a GCC bug, because, well, I've read 
those discussions in the GCC forums.

-- 
Michael Wojcik

Reply via email to