Hello, On Tue, 11 Jul 2017, Jeff Law wrote:
> This patch series is designed to mitigate the problems exposed by the > stack-clash exploits. As I've noted before, the way to address this > class of problems is via a good stack probing strategy. > > This has taken much longer than expected to pull together for > submission. Sorry about that. However, the delay has led to some clear > improvements on ppc, aarch64 and s390 as well as tests which aren't > eyeballed, but instead are part of the testsuite. > > This series introduces -fstack-check=clash which is a variant of > -fstack-check designed to prevent "jumping the stack" as seen in the > stack-clash exploits. FWIW, this is the patch we're going to use in our older compilers (back up to 4.1, meh) in one or another variant. It only probes for dynamic allocations, not for static stack frames. And it probes more often than strictly necessary. But on the plus side it is completely target independend (except STACK_GROWS_DOWNWARD, which it doesn't handle because we don't have hppa) and only 70 lines, doesn't interact with any of the hairy existing stack checking code and it's easy to see that it does the right thing :) (This particular variant is for 4.3, but the code of allocate_dynamic_stack_space() is essentially stable since a very long time, which is another plus thing of this patch, it's easy to back- and forward-port :) ) I'm not suggesting this for inclusion, but in case others are in a similar position of having to deal with old compilers and are fine with the above, they might find this useful. Ciao, Michael. --- gcc/common.opt.mm 2017-06-26 16:07:55.000000000 +0200 +++ gcc/common.opt 2017-06-26 16:05:27.000000000 +0200 @@ -966,6 +966,10 @@ fstack-check Common Report Var(flag_stack_check) Insert stack checking code into the program +fstack-probe +Common Report Var(flag_stack_probe) +Insert stack checking code into the program + fstack-limit Common --- gcc/explow.c.mm 2008-11-05 22:19:47.000000000 +0100 +++ gcc/explow.c 2017-06-26 17:31:25.000000000 +0200 @@ -1071,6 +1071,9 @@ update_nonlocal_goto_save_area (void) rtx allocate_dynamic_stack_space (rtx size, rtx target, int known_align) { + rtx loop_lab, end_lab, last_size; + int probe_pass = 0; + /* If we're asking for zero bytes, it doesn't matter what we point to since we can't dereference it. But return a reasonable address anyway. */ @@ -1203,6 +1206,24 @@ allocate_dynamic_stack_space (rtx size, mark_reg_pointer (target, known_align); + if (flag_stack_probe) + { + size = copy_to_mode_reg (Pmode, convert_to_mode (Pmode, size, 1)); + loop_lab = gen_label_rtx (); + end_lab = gen_label_rtx (); + emit_label (loop_lab); +#ifndef STACK_GROWS_DOWNWARD +#error stack must grow down +#endif + emit_cmp_and_jump_insns (size, GEN_INT (STACK_CHECK_PROBE_INTERVAL), LTU, + NULL_RTX, Pmode, 1, end_lab); + last_size = expand_binop (Pmode, sub_optab, size, GEN_INT (STACK_CHECK_PROBE_INTERVAL), size, + 1, OPTAB_WIDEN); + gcc_assert (last_size == size); + size = GEN_INT (STACK_CHECK_PROBE_INTERVAL); + } + +again: /* Perform the required allocation from the stack. Some systems do this differently than simply incrementing/decrementing from the stack pointer, such as acquiring the space by calling malloc(). */ @@ -1264,6 +1285,15 @@ allocate_dynamic_stack_space (rtx size, emit_move_insn (target, virtual_stack_dynamic_rtx); #endif } + if (flag_stack_probe && probe_pass == 0) + { + probe_pass = 1; + emit_stack_probe (target); + emit_jump (loop_lab); + emit_label (end_lab); + size = last_size; + goto again; + } if (MUST_ALIGN) { @@ -1280,6 +1310,8 @@ allocate_dynamic_stack_space (rtx size, GEN_INT (BIGGEST_ALIGNMENT / BITS_PER_UNIT), NULL_RTX, 1); } + if (flag_stack_probe) + emit_stack_probe (target); /* Record the new stack level for nonlocal gotos. */ if (cfun->nonlocal_goto_save_area != 0)