On 06/20/2017 03:27 AM, Richard Earnshaw (lists) wrote:
> On 19/06/17 18:07, Jeff Law wrote:
>> As some of you are likely aware, Qualys has just published fairly
>> detailed information on using stack/heap clashes as an attack vector.
>> Eric B, Michael M -- sorry I couldn't say more when I contact you about
>> -fstack-check and some PPC specific stuff.  This has been under embargo
>> for the last month.
>>
>>
>> --
>>
>>
>> http://www.openwall.com/lists/oss-security/2017/06/19/1
>>
> [...]
>> aarch64 is significantly worse.  There are no implicit probes we can
>> exploit.  Furthermore, the prologue may allocate stack space 3-4 times.
>> So we have the track the distance to the most recent probe and when that
>> distance grows too large, we have to emit a probe.  Of course we have to
>> make worst case assumptions at function entry.
>>
> 
> I'm not sure I understand what you're saying here.  According to the
> comment above aarch64_expand_prologue, the stack frame looks like:
> 
> +-------------------------------+
> |                               |
> |  incoming stack arguments     |
> |                               |
> +-------------------------------+
> |                               | <-- incoming stack pointer (aligned)
> |  callee-allocated save area   |
> |  for register varargs         |
> |                               |
> +-------------------------------+
> |  local variables              | <-- frame_pointer_rtx
> |                               |
> +-------------------------------+
> |  padding0                     | \
> +-------------------------------+  |
> |  callee-saved registers       |  | frame.saved_regs_size
> +-------------------------------+  |
> |  LR'                          |  |
> +-------------------------------+  |
> |  FP'                          | / <- hard_frame_pointer_rtx (aligned)
> +-------------------------------+
> |  dynamic allocation           |
> +-------------------------------+
> |  padding                      |
> +-------------------------------+
> |  outgoing stack arguments     | <-- arg_pointer
> |                               |
> +-------------------------------+
> |                               | <-- stack_pointer_rtx (aligned)
> 
> Now for the majority of frames the amount of local variables is small
> and there is neither dynamic allocation nor the need for outgoing local
> variables.  In this case the first instruction in the function is
> 
>       stp     fp, lr, [sp, #-FrameSize
But the stack pointer might have already been advanced into the guard
page by the caller.   For the sake of argument assume the guard page is
0xf1000 and assume that our stack pointer at entry is 0xf1010 and that
the caller hasn't touched the 0xf1000 page.

If FrameSize >= 32, then the stores are going to hit the 0xf0000 page
rather than the 0xf1000 page.   That's jumping the guard.  Thus we have
to emit a probe prior to this stack allocation.

Now because this instruction stores at *new_sp, it does allow us to
eliminate future probes and I do take advantage of that in my code.

The implementation is actually rather simple.  We keep a conservative
estimate of the offset of the last known probe relative to the stack
pointer.  At entry we have to assume the offset is:

PROBE_INTERVAL - (STACK_BOUNDARY / BITS_PER_UNIT)


A stack allocation increases the offset.  A store into the stack
decreases the offset.
 i
A probe is required before an allocation that increases the offset to >=
PROBE_INTERVAL.

An allocation + store instruction such as shown does both, but can (and
is) easily modeled.  THe only tricky case here is that you can't naively
break it up into an allocation and store as that can force an
unnecessary probe (say if the allocated space is just enough to hold the
stored objects).




> 
> 
> If the locals area gets slightly larger (>= 512 bytes) then the sequence
> becomes
>       sub     sp, sp, #FrameSize
>       stp     fp, lr, [sp]
> 
> But again this acts as a sufficient implicit probe provided that
> FrameSize does not exceed the probe interval.
And again, the store acts as a probe which can eliminate potential
probes that might occur later in the instruction stream.  But if the
allocation by the "sub" instruction causes our running offset to cross
PROBE_BOUNDARY, then we must emit a probe prior to the "sub" instruction.

Hopefully it'll be clearer when I post the code :-)  aarch64 is one that
will need updating as all work to-date has been with Red Hat's 4.8
compiler with the aarch64 code generator bolted onto the side.

So perhaps "no implicit probes" was too strong.  It would probably be
better stated "no implicit probes in the caller".  We certainly use
stores in the prologue to try and eliminate probes.  In fact, we try
harder on aarch64 than any other target.

Jeff

Reply via email to