Re: [PATCH ARM]Refine scaled address expression on ARM

Bin.Cheng Sun, 01 Dec 2013 19:23:05 -0800

On Sat, Nov 30, 2013 at 12:34 AM, Richard Earnshaw <rearn...@arm.com> wrote:
> On 29/11/13 11:46, Yufeng Zhang wrote:
>> On 11/29/13 07:52, Bin.Cheng wrote:
>>> After thinking twice, I some kind of think we should not re-associate
>>> addresses during expanding, because of lacking of context information.
>>>   Take base + scaled_index + offset as an example in PR57540, we just
>>> don't know if "base+offset" is loop invariant from either backend or
>>> RTL expander.
>>
>> I'm getting less convinced by re-associating base with offset
>> unconditionally.  One counter example is
>>
>> typedef int arr_1[20];
>> void foo (arr_1 a1, int i)
>> {
>>    a1[i+10] = 1;
>> }
>>
>> I'm experimenting a patch to get the immediate offset in the above
>> example to be the last addend in the address computing (as mentioned in
>> http://gcc.gnu.org/ml/gcc/2013-11/msg00581.html), aiming to get the
>> following code-gen:
>>
>>          add     r1, r0, r1, asl #2
>>          mov     r3, #1
>>          str     r3, [r1, #40]
>>
>> With your patch applied, the effort will be reverted to
>>
>>          add     r0, r0, #40
>>          mov     r3, #1
>>          str     r3, [r0, r1, asl #2]
>>
>
> And another one is:
>
>
>
> typedef int arr_1[20];
> void foo (arr_1 a1, int i)
> {
>    a1[i+10] = 1;
>    a1[i+11] = 1;
> }
>
> This should compile to:
>
>         add     r1, r0, r1, asl #2
>         mov     r3, #1
>         str     r3, [r1, #40]
>         str     r3, [r1, #44]
>
> And which on Thumb2 should then collapse to:
>
>         add     r1, r0, r1, asl #2
>         mov     r3, #1
>         strd    r3, r3, [r1, #40]
>
> With your patch I don't see any chance of being able to get to this
> situation.
>
> (BTW, we currently generate:
>
>         mov     r3, #1
>         add     r1, r1, #10
>         add     r2, r0, r1, asl #2
>         str     r3, [r0, r1, asl #2]
>         str     r3, [r2, #4]
>
> which is insane).
>
> I think I see where you're coming from on the original testcase, but I
> think you're trying to solve the wrong problem.   In your test case the
> base is an eliminable register, which is likely to be replaced with an
> offset expression during register allocation.  The problem then, I
> think, is that the cost of these virtual registers is treated the same
> as any other pseudo register, when it may really have the cost of a PLUS
> expression.
>
> Perhaps the cost of using an eliminable register should be raised in
> rtx_costs() (treating them as equivalent to (PLUS (reg) (CONST_INT
> (TBD))), so that loop optimizations will try to hoist suitable
> sub-expressions out the loop and replace them with real pseudos.
>
I now have access to the code.
The gimple before expanding is like:
  <bb 6>:
  # j_26 = PHI <j_9(8), 0(5)>
  # k_29 = PHI <k_8(8), k_24(5)>
  j_9 = j_26 + 1;
  k_8 = parent[k_29];
  if (k_8 >= 0)
    goto <bb 8>;
  else
    goto <bb 7>;


The rtl generated after expanding is like:
   88: NOTE_INSN_BASIC_BLOCK 7
   89: r174:SI=r174:SI+0x1
   90: r191:SI=r173:SI<<0x2
   91: r190:SI=r105:SI+r191:SI
   92: r173:SI=[r190:SI-0x810]

with r105 == virtual_stack_vars_rtx, and it will be instantiated into
frame_pointer_rtx in vregs pass:
   88: NOTE_INSN_BASIC_BLOCK 7
   89: r174:SI=r174:SI+0x1
   90: r191:SI=r173:SI<<0x2
   91: r190:SI=sfp:SI+r191:SI
   92: r173:SI=[r190:SI-0x810]

As you pointed out, sfp is not hoisted as a high cost invariant.  I am
not sure if loop-invariant will hoist a single pseudo reg even it's
assigned with a higher cost.
But before the invariant problem, the choice made by RTL expand is bad
because it hides the CSE opportunity, because (sfp + r173<<2 - 0x810)
== (sp + r173<<2), (sfp-0x810) can be folded into (sp), then we can
embed the shift instruction in scaled addressing mode [sp + r173 <<
2].

Thanks,
bin

-- 
Best Regards.

Re: [PATCH ARM]Refine scaled address expression on ARM

Reply via email to