On 07/16/2018 07:54 AM, Tamar Christina wrote:
> The 07/13/2018 17:46, Jeff Law wrote:
>> On 07/12/2018 11:39 AM, Tamar Christina wrote:
>>>>> +
>>>>> +  /* Round size to the nearest multiple of guard_size, and calculate the
>>>>> +     residual as the difference between the original size and the rounded
>>>>> +     size. */
>>>>> +  HOST_WIDE_INT rounded_size = size & -guard_size;
>>>>> +  HOST_WIDE_INT residual = size - rounded_size;
>>>>> +
>>>>> +  /* We can handle a small number of allocations/probes inline.  
>>>>> Otherwise
>>>>> +     punt to a loop.  */
>>>>> +  if (rounded_size <= STACK_CLASH_MAX_UNROLL_PAGES * guard_size)
>>>>> +    {
>>>>> +      for (HOST_WIDE_INT i = 0; i < rounded_size; i += guard_size)
>>>>> + {
>>>>> +   aarch64_sub_sp (NULL, temp2, guard_size, true);
>>>>> +   emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx,
>>>>> +                                    STACK_CLASH_CALLER_GUARD));
>>>>> + }
>>>> So the only concern with this code is that it'll be inefficient and
>>>> possibly incorrect for probe sizes larger than ARITH_FACTOR.
>>>> Ultimately, that's a case I don't think we really care that much about.
>>>> I wouldn't lose sleep if the port clamped the requested probing interval
>>>> at ARITH_FACTOR.
>>>>
>>> I'm a bit confused here, the ARITH_FACTOR seems to have to do with the Ada
>>> stack probing implementation, which isn't used by this new code aside
>>> from the part that emits the actual probe when doing a variable or large
>>> allocation in aarch64_output_probe_stack_range.
>>>
>>> Clamping the probing interval at ARITH_FACTOR means we can't do 64KB
>>> probing intervals. 
>> It may have been a misunderstanding on my part.  My understanding is
>> that ARITH_FACTOR represents the largest immediate constant we could
>> handle in this code using a single insn.  Anything above ARITH_FACTOR
>> needed a scratch register and I couldn't convince myself that we had a
>> suitable scratch register available.
>>
>> But I'm really not well versed on the aarch64 architecture or the
>> various implementation details in aarch64.c.  So I'm happy to defer to
>> you and others @ ARM on what's best to do here.
> 
> Ah no, that 12 bit immediate for str offset is unscaled. Scaled it's 15 bits 
> for the 64bit store case.
> So the actual limit is 32760, so it's quite a bit larger than ARITH_FACTOR.
> 
> The value of STACK_CLASH_CALLER_GUARD is fixed in the back-end and can't be
> changed, and if it's made too big will just give a compile error.
ACK.  Thanks for explaining.


> 
>>
>>
>>>> That can be problematical in a couple cases.  First it can run afoul of
>>>> combine-stack-adjustments.  Essentially that pass will combine a series
>>>> of stack adjustments into a single insn so your unrolled probing turns
>>>> into something like this:
>>>>
>>>>   multi-page stack adjust
>>>>   probe first page
>>>>   probe second page
>>>>   probe third page
>>>>   and so on..
>>>>
>>> This is something we actually do want, particularly in the case of 4KB pages
>>> as the immediates would fit in the store.  It's one of the things we were
>>> thinking about for future improvements.
>>>
>>>> That violates the design constraint that we never allocate more than a
>>>> page at a time.
>>> Would there be a big issue here if we didn't adhere to this constraint?
>> Yes, because it enlarges a window for exploitation.  Consider signal
>> delivery occurring after the adjustment but before the probes.  The
>> signal handler could then be running on a clashed heap/stack.
>>
> 
> Ah, you're quite right.. I didn't factor in asynchronous events during the 
> stack
> probing.  I have restored the barriers and added some documentation on why 
> they're
> needed.
Sounds good.


> 
>>>
>>>> Do you happen to have a libc.so and ld.so compiled with this code?  I've
>>>> got a scanner here which will look at a DSO and flag obviously invalid
>>>> stack allocations.  If you've got a suitable libc.so and ld.so I can run
>>>> them through the scanner.
>>>>
>>>>
>>>> Along similar lines, have you run the glibc testsuite with this stuff
>>>> enabled by default.  That proved useful to find codegen issues,
>>>> particularly with the register inheritance in the epilogue.
>>>>
>>> I'm running one now, I'll send the two binaries once I get the results back
>>> from the run. Thanks!
>> Great.  Looking forward getting those  .so files I can can throw them
>> into the scanner.
> 
> I have finished running the glibc testsuite and there were no new regressions.
Great.  And as mentioned privately, scanning looks reasonable.

So I've got no concerns at this point.  If the aarch64 maintainers are
OK with the changes this stuff can go in.

jeff

Reply via email to