Finally a patch that works and is simple. Bootstrapped and
regression tested on s390, s390x biarch and x86_64. The new patch
exploits the known alignment of (stack pointer +
STACK_DYNAMIC_OFFSET) as described earlier (see below). I think
that is the right way to get rid of the extra allocation. It
took a long time to understand the problem.
As the patch triggers a bug in the fortran compiler, the
der_type.f90 test case may fail on some targets if this patch is
used without the fortran fix that I've posted in another thread.
(The patch also contains a fix for a typo in a comment in the
patched function.)
See ChangeLog for a full description of the new patch.
Since the patch is all new, we're not going to commit it without a
new OK.
> Actually I was goind to abandon the patch in its current state.
> :-) We talked about it internally and concluded that the problem
> is really this:
>
> * get_dynamic_stack_size is passed a SIZE of a data block (which
> is allocated elsewhere), the SIZE_ALIGN of the SIZE (i.e. the
> alignment of the underlying memory units (e.g. 32 bytes split
> into 4 times 8 bytes = 64 bit alignment) and the
> REQUIRED_ALIGN of the data portion of the allocated memory.
> * Assuming the function is called with SIZE = 2, SIZE_ALIGN = 8
> and REQUIRED_ALIGN = 64 it first adds 7 bytes to SIZE -> 9.
> This is what is needed to have two bytes 8-byte-aligned at some
> memory location without any known alignment.
> * Finally round_push is called to round up SIZE to a multiple of
> the stack slot size.
>
> The key to understanding this is that the function assumes that
> STACK_DYNMAIC_OFFSET is completely unknown at the time its called
> and therefore it does not make assumptions about the alignment of
> STACKPOINTER + STACK_DYNMAIC_OFFSET. The latest patch simply
> hard-codes that SP + SDO is supposed to be aligned to at least
> stack slot size (and does that in a very complicated way). Since
> there is no guarantee that this is the case on all targets, the
> patch is broken. It may miscalculate a SIZE that is too small in
> some cases.
>
> However, on many targets there is some guarantee about the
> alignment of SP + SDO even if the actual value of SDO is unknown.
> On s390x it's always 8-byte-aligned (stack slot size). So the
> right fix should be to add knowledge about the target's guaranteed
> alignment of SP + SDO to the function. I'm right now testing a
> much simpler patch that uses
> REGNO_POINTER_ALIGN(VIRTUAL_STACK_DYNAMIC_REGNUM) as the
> alignment.
Ciao
Dominik ^_^ ^_^
--
Dominik Vogt
IBM Germany
gcc/ChangeLog
* explow.c (get_dynamic_stack_size): Take known alignment of stack
pointer + STACK_DYNAMIC_OFFSET into account when calculating the size
needed.
Correct a typo in a comment.
>From 7d7a6b7fdb189759eb11b96c93ee4adfa8608a97 Mon Sep 17 00:00:00 2001
From: Dominik Vogt <[email protected]>
Date: Fri, 29 Apr 2016 08:36:59 +0100
Subject: [PATCH] Reduce size allocated for run time allocated stack
variables.
The present calculation sometimes led to more stack memory being used than
necessary with alloca. First, (STACK_BOUNDARY -1) would be added to the
allocated size:
size = plus_constant (Pmode, size, extra);
size = force_operand (size, NULL_RTX);
Then round_push was called and added another (STACK_BOUNDARY - 1) before
rounding down to a multiple of STACK_BOUNDARY. On s390x this resulted in
adding 14 before rounding down for "x" in the test case pr36728-1.c.
The problem was that get_dynamic_stack_size did not take into account that the
target might guarantee some alignment of (stack_pointer + STACK_DYNAMIC_OFFSET)
even if the value of STACK_DYNAMIC_OFFSET is not known yet.
---
gcc/explow.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/gcc/explow.c b/gcc/explow.c
index a345690..f97a214 100644
--- a/gcc/explow.c
+++ b/gcc/explow.c
@@ -1224,9 +1224,15 @@ get_dynamic_stack_size (rtx *psize, unsigned size_align,
example), so we must preventively align the value. We leave space
in SIZE for the hole that might result from the alignment operation. */
- extra = (required_align - BITS_PER_UNIT) / BITS_PER_UNIT;
- size = plus_constant (Pmode, size, extra);
- size = force_operand (size, NULL_RTX);
+ unsigned known_align = REGNO_POINTER_ALIGN (VIRTUAL_STACK_DYNAMIC_REGNUM);
+ if (known_align == 0)
+ known_align = BITS_PER_UNIT;
+ if (required_align > known_align)
+ {
+ extra = (required_align - known_align) / BITS_PER_UNIT;
+ size = plus_constant (Pmode, size, extra);
+ size = force_operand (size, NULL_RTX);
+ }
if (flag_stack_usage_info && pstack_usage_size)
*pstack_usage_size += extra;
@@ -1235,7 +1241,7 @@ get_dynamic_stack_size (rtx *psize, unsigned size_align,
size_align = BITS_PER_UNIT;
/* Round the size to a multiple of the required stack alignment.
- Since the stack if presumed to be rounded before this allocation,
+ Since the stack is presumed to be rounded before this allocation,
this will maintain the required alignment.
If the stack grows downward, we could save an insn by subtracting
--
2.3.0