On Sat, Nov 30, 2013 at 12:34 AM, Richard Earnshaw <rearn...@arm.com> wrote: > On 29/11/13 11:46, Yufeng Zhang wrote: >> On 11/29/13 07:52, Bin.Cheng wrote: >>> After thinking twice, I some kind of think we should not re-associate >>> addresses during expanding, because of lacking of context information. >>> Take base + scaled_index + offset as an example in PR57540, we just >>> don't know if "base+offset" is loop invariant from either backend or >>> RTL expander. >> >> I'm getting less convinced by re-associating base with offset >> unconditionally. One counter example is >> >> typedef int arr_1[20]; >> void foo (arr_1 a1, int i) >> { >> a1[i+10] = 1; >> } >> >> I'm experimenting a patch to get the immediate offset in the above >> example to be the last addend in the address computing (as mentioned in >> http://gcc.gnu.org/ml/gcc/2013-11/msg00581.html), aiming to get the >> following code-gen: >> >> add r1, r0, r1, asl #2 >> mov r3, #1 >> str r3, [r1, #40] >> >> With your patch applied, the effort will be reverted to >> >> add r0, r0, #40 >> mov r3, #1 >> str r3, [r0, r1, asl #2] >> > > And another one is: > > > > typedef int arr_1[20]; > void foo (arr_1 a1, int i) > { > a1[i+10] = 1; > a1[i+11] = 1; > } > > This should compile to: > > add r1, r0, r1, asl #2 > mov r3, #1 > str r3, [r1, #40] > str r3, [r1, #44] > > And which on Thumb2 should then collapse to: > > add r1, r0, r1, asl #2 > mov r3, #1 > strd r3, r3, [r1, #40] > > With your patch I don't see any chance of being able to get to this > situation. > > (BTW, we currently generate: > > mov r3, #1 > add r1, r1, #10 > add r2, r0, r1, asl #2 > str r3, [r0, r1, asl #2] > str r3, [r2, #4] > > which is insane). > > I think I see where you're coming from on the original testcase, but I > think you're trying to solve the wrong problem. In your test case the > base is an eliminable register, which is likely to be replaced with an > offset expression during register allocation. The problem then, I > think, is that the cost of these virtual registers is treated the same > as any other pseudo register, when it may really have the cost of a PLUS > expression. > > Perhaps the cost of using an eliminable register should be raised in > rtx_costs() (treating them as equivalent to (PLUS (reg) (CONST_INT > (TBD))), so that loop optimizations will try to hoist suitable > sub-expressions out the loop and replace them with real pseudos. > I now have access to the code. The gimple before expanding is like: <bb 6>: # j_26 = PHI <j_9(8), 0(5)> # k_29 = PHI <k_8(8), k_24(5)> j_9 = j_26 + 1; k_8 = parent[k_29]; if (k_8 >= 0) goto <bb 8>; else goto <bb 7>;
The rtl generated after expanding is like: 88: NOTE_INSN_BASIC_BLOCK 7 89: r174:SI=r174:SI+0x1 90: r191:SI=r173:SI<<0x2 91: r190:SI=r105:SI+r191:SI 92: r173:SI=[r190:SI-0x810] with r105 == virtual_stack_vars_rtx, and it will be instantiated into frame_pointer_rtx in vregs pass: 88: NOTE_INSN_BASIC_BLOCK 7 89: r174:SI=r174:SI+0x1 90: r191:SI=r173:SI<<0x2 91: r190:SI=sfp:SI+r191:SI 92: r173:SI=[r190:SI-0x810] As you pointed out, sfp is not hoisted as a high cost invariant. I am not sure if loop-invariant will hoist a single pseudo reg even it's assigned with a higher cost. But before the invariant problem, the choice made by RTL expand is bad because it hides the CSE opportunity, because (sfp + r173<<2 - 0x810) == (sp + r173<<2), (sfp-0x810) can be folded into (sp), then we can embed the shift instruction in scaled addressing mode [sp + r173 << 2]. Thanks, bin -- Best Regards.