rnk added a comment.
For maintenance reasons, I'd really prefer it if we could find a way to reuse
the existing code that calls an external stack probe function. What do you
think about taking a look at X86RetpolineThunks.cpp and doing something similar
to that? Basically, when the user sets `-fstack-clash-protection`, LLVM will
emit a small comdat+weak function into every object file that has the same ABI
as the existing stack probe mechanism. For other prior art, you can also look
at how `__clang_call_terminate` works.
================
Comment at: llvm/lib/Target/X86/X86FrameLowering.cpp:400
+ !(STI.isOSWindows() && !STI.isTargetMachO());
+ if (InlineStackClashProtector && !InEpilogue) {
+ const uint64_t PageSize = TLI.getStackProbeSize(MF);
----------------
serge-sans-paille wrote:
> efriedma wrote:
> > Why is this code in a different location from the stack probing code that
> > generates a call?
> Because `BuildStackAdjustment` has more callsites than just emitPrologue and
> we want to capture all stack manipulation.
If you care about those cases, we should have tests for all of them. These are
all the cases:
1. TailCallReturnAddrDelta: When doing guaranteed TCO for a callee with >4K of
argument memory
2. eliminateCallFramePseudoInstr: When doing stack adjustments to set up a call
with more than 4K of arguments
Both of these cases involve setting up arguments to calls, and I think we can
guarantee that the compiler will emit writes to every argument stack slot. So,
to set up one of these cases, we'd have this code:
```
struct EightK { int a[2048]; };
void passEightK(EightK);
void foo() {
EightK x;
memset(&x, 0, sizeof(x));
passEightK(x); // some targets will pass indirect, 32-bit probably uses byval.
}
```
In this case, LLVM happens to use rep;movsl to copy the argument memory. It
could use memcpy, but either way, it will probe all those bytes.
I think that means it's safe to move to emitSPUpdate. I would greatly prefer
that `BuildStackAdjustment` remain a simple primitive that generates a single
instruction.
================
Comment at: llvm/lib/Target/X86/X86FrameLowering.cpp:353
+static const DenseMap<unsigned, unsigned> OpcodeToSPOperandIndex = {
+ {X86::MOV8mi, 3}, {X86::MOV16mi, 3}, {X86::MOV32mi, 3}, {X86::MOV64mi32,
3},
----------------
Please no dynamically initialized globals. The LLVM-y thing would be to use a
switch.
================
Comment at: llvm/test/CodeGen/X86/stack-clash-dynamic-alloca.ll:9-11
+; CHECK: subq $4096, %rsp
+; CHECK: movq $0, (%rsp)
+ %a = alloca i32, i32 %n, align 16
----------------
This seems like a good use case for update_llc_test_checks.py. I'd want to see
the body of the loop, the cmp, the jl, etc.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D68720/new/
https://reviews.llvm.org/D68720
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits