Re: [PATCH][AArch64 - v3] Simplify eh_return implementation

Wilco Dijkstra Fri, 13 Jan 2017 11:52:26 -0800

James Greenhalgh wrote:

> I've been putting off reviewing this patch for a while now, because I don't
> understand enough about the current eh_return code to understand why what
> you're proposing is correct.
>
> The best way to progress this patch would be to go in to more detail as to
> what the current code does, why we don't need it, and why your new
> implementation is sufficient. The current code looks like it covers special
> cases, and calls out DSE and CSELIB as needing special handling - your
> new code doesn't.
> 
> Could you explain your patch to me assuming I know very little about the
> implementation, with particular focus on why we no longer need the special
> cases?


Well eh_return is far more complex than the few lines of assembly code it was
intending to avoid...

Basically the EH return feature wants the capability to either return normally 
or return
to a previous frame after unwinding. Oddly enough the design is to use a single 
shared
return sequence. The epilog is exactly like a normal epilog except that it has 
an extra
input register which contains the stack adjustment which must be applied after 
the
frame has been destroyed. An extra label is inserted before the eplilog which 
sets this
stack adjustment to zero, and this is the entry point for a normal return. 

An EH return updates the return address, initializes the stack adjustment and 
jumps
directly into the epilog (bypassing the zeroing of the adjustment). Sounds 
insane already?
Well it gets worse...

Given the return address is typically saved on the stack when a function makes 
a 
call, we now need to overwrite the saved LR outside the epilog. This poses a 
problem 
as we need to emit this store before the epilog is emitted, ie. before the 
offset of LR is 
even known. We also need to ensure the store is not removed. 

The existing implementation tries to do this by using a special eh_return 
pattern which
emits the store as late as possible (at least after the frame layout has been 
finalized).
However it doesn't do this late enough so DSE is still run after it...

To try to avoid DSE removing the store, the existing implementation tries to 
make the 
store alias with the load by using the same base register and offset. Given the 
epilog 
always reads LR from SP, it follows that whenever FP is used for the store, it 
will be
optimized away.

Even when it uses SP as the base for the store, SP may be updated in the epilog 
before
LR is read (for example alloca functions or frames with large locals or 
outgoing arguments
will update SP before reading LR). Worse, the offset it uses for the store is 
incorrect, so if
the store is not optimized, it just corrupts a random callee-save rather than 
overwriting LR.

So basically to conclude the existing implementation only works for one 
specific case
where there are a specific number of callee-saves, no floating point registers, 
no outgoing
arguments, no alloca used, a small number of locals, and frame pointer is 
enabled.

The new implementation is trivially correct by forcing the use of a frame 
pointer so that
the location of LR is fixed, known early, and using a volatile means no 
optimization could
remove the store. It's also much simpler by avoiding all the workarounds using 
patterns,
splitters and trying to ensure the store appears not dead.

Wilco

Re: [PATCH][AArch64 - v3] Simplify eh_return implementation

Reply via email to