On Sun, Jan 14, 2018 at 10:40:36PM +0100, Arnd Bergmann wrote:
> Right. I've done some more investigation anyway, starting over with the
> analysis of the gcc options that change it. I've found now that turning
> off '-fcode-hoisting' but leaving on the other options I had suspected
> earlier (-O2 instead of -Os, -ftree-sra, -ftree-pre) also fixes the
> stack problem, and appears to result in the best performance so
> far.

Oh nice!

> I need to rerun the whole test matrix, but that seems rather
> promising, and the result may also help debug what's really happening.

-fcode-hoisting moves all expression evaluation to as early as possible;
for this AES code that means it will increase register pressure a lot,
causing a lot of spilling (well, that is my guess).  If that is so, then
we need to dial down -fcode-hoisting a bit, maybe make it aware of
register pressure.

Glad you found a smoking gun,


Segher

Reply via email to