On 22 November 2012 20:53, Zhenqiang Chen <zhenqiang.c...@linaro.org> wrote:
> On 21 November 2012 09:20, Zhenqiang Chen <zhenqiang.c...@linaro.org> wrote:
>> On 21 November 2012 03:26, Michael Hope <michael.h...@linaro.org> wrote:
>>> On 20 November 2012 22:10, Zhenqiang Chen <zhenqiang.c...@linaro.org> wrote:
>>>> Hi,
>>>>
>>>> I try ARM, MIPS, PowerPC and X86 on povray benchmark. No one can
>>>> shrink-wrap function Ray_In_Bound.
>>>>
>>>> Here is:
>>>> bool Ray_In_Bound (RAY *Ray, OBJECT *Bounding_Object)
>>>> {
>>>>   ...
>>>>   for (Bound = Bounding_Object; Bound != NULL; Bound = Bound->Sibling)
>>>>   {...}
>>>>   return (true);
>>>> }
>>>> For ARM O2/O3, "Bound" is allocated to "r6" during ira. So there is copy
>>>>
>>>> r6 = r1 before
>>>> testing Bound != NULL
>>>
>>> Could you hack the benchmark to make the early exit explicit and see
>>> if that changes the result?  That lets us know if improving shrink
>>> wrap is worthwhile.
>>>
>>> Something like:
>>>
>>>  bool Ray_In_Bound (RAY *Ray, OBJECT *Bounding_Object)
>>>  {
>>>   if (Bounding_Object == NULL) return true;
>>
>> I had tried it. The result is the same with the original one. (The
>> hack code is optimized)
>
> After hacking the assemble code, I got 2-3% performance improvement
> for -O2. Here is the assemble change
> Original code:
>         push    {r4, r5, r6, r7, r8, r9, lr}
>         .save {r4, r5, r6, r7, r8, r9, lr}
>         mov     r6, r1
>         .pad #196
>         sub     sp, sp, #196
>         cbz     r1, .L113
>         ldr     r8, .L117
>         ...
> .L113:
>         movs    r0, #1
>         add     sp, sp, #196
>         @ sp needed
>         pop     {r4, r5, r6, r7, r8, r9, pc}
>
> After shrink-wrap:
>         cbz     r1, .L1131
>         push    {r4, r5, r6, r7, r8, r9, lr}
>         .save {r4, r5, r6, r7, r8, r9, lr}
>         mov     r6, r1
>         .pad #196
>         sub     sp, sp, #196
>         ldr     r8, .L117
>         ...
> .L113:
>         movs    r0, #1
>         add     sp, sp, #196
>         @ sp needed
>         pop     {r4, r5, r6, r7, r8, r9, pc}
> .L1131:
>         movs    r0, #1
>         bx      lr
>
> But simple hack for -O3 has ~1% regression. "code alignment" change
> should be the root cause. To verify it, I add 6 NOPs after "bx lr".
> With it, the size of block .L1131 is 16 Bytes. After this change, O3
> will have 2-3% performance improvement.

That's good then.  So modulo supposed alignment changes, your current
shrink wrap patch causes no speed regressions and has the potential to
show an improvement.

Worth finishing and committing.  Shrinkwrap was a mess last time - we
need to check that all of these bugs:
 http://goo.gl/6fGg5

are clear before upstreaming/backporting.

-- Michael

_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to