On Fri, 1 Jul 2016, Jakub Jelinek wrote: > On Thu, Jun 30, 2016 at 03:51:20PM +0200, Richard Biener wrote: > > The following patch fixes PR71632 by removing delayed expansion of > > TERed defs. Instead it adds code to apply the scheduling effect > > to the GIMPLE IL (so you also get better interleaved GIMPLE stmt > > / generated RTL dumps in .expand). > > Does anything from TER survive after this patch? > I thought the whole point was that the expansion can see through > the SSA_NAMEs and optimize based on that, by not seeing through > them it doesn't, or if it somewhere still uses get_gimple_for_ssa_name, > if the definition will be already expanded, it might expand stuff multiple > times.
Yes, get_gimple_for_ssa_name is what survives (also the scheduling effect though that is applied on GIMPLE now). And yes, I noted the issue of multiple expansions with get_gimple_for_ssa_name in 2) (and that this is probably not worse than multiple expansion through lazy expansion that ultimatively fails). And yes, it no longer sees through SSA names for the cases that the lazy SSA name expansion returned sth !REG_P. Given the testresults show some regressions plus patched cc1 is 0.2% larger the patch obviously isn't ready yet. Still I believe it is ultimatively the way to go as in theory fwprop/combine should have everything at hand to recover the original expansion from the single-use reg defs. gcc.target/i386/xorps-sse2.c for example shows a missing transform on GIMPLE given the comment on the testcase /* Test that we generate xorps when the result is used in FP math. */ is not reflected by what we do at expansion which decides locally for the vector int f_int ^ g operation to rewrite it as float operation (and subregs the result into vector int). So it would do that even if the result is used in an integer operation: vector int x(vector float f, vector int h) { vector int g = { 0x80000000, 0, 0x80000000, 0 }; vector int f_int = (vector int) f; return (f_int ^ g) + h; } turns into x: .LFB1: .cfi_startproc xorps .LC0(%rip), %xmm0 paddd %xmm1, %xmm0 ret but with the testcases logic it should have better used pxor. Richard.