On 23 November 2012 04:25, Michael Hope <michael.h...@linaro.org> wrote:
> On 22 November 2012 20:53, Zhenqiang Chen <zhenqiang.c...@linaro.org> wrote:
>> On 21 November 2012 09:20, Zhenqiang Chen <zhenqiang.c...@linaro.org> wrote:
>>> On 21 November 2012 03:26, Michael Hope <michael.h...@linaro.org> wrote:
>>>> On 20 November 2012 22:10, Zhenqiang Chen <zhenqiang.c...@linaro.org> 
>>>> wrote:
>>>>> Hi,
>>>>>
>>>>> I try ARM, MIPS, PowerPC and X86 on povray benchmark. No one can
>>>>> shrink-wrap function Ray_In_Bound.
>>>>>
>>>>> Here is:
>>>>> bool Ray_In_Bound (RAY *Ray, OBJECT *Bounding_Object)
>>>>> {
>>>>>   ...
>>>>>   for (Bound = Bounding_Object; Bound != NULL; Bound = Bound->Sibling)
>>>>>   {...}
>>>>>   return (true);
>>>>> }
>>>>> For ARM O2/O3, "Bound" is allocated to "r6" during ira. So there is copy
>>>>>
>>>>> r6 = r1 before
>>>>> testing Bound != NULL
>>>>
>>>> Could you hack the benchmark to make the early exit explicit and see
>>>> if that changes the result?  That lets us know if improving shrink
>>>> wrap is worthwhile.
>>>>
>>>> Something like:
>>>>
>>>>  bool Ray_In_Bound (RAY *Ray, OBJECT *Bounding_Object)
>>>>  {
>>>>   if (Bounding_Object == NULL) return true;
>>>
>>> I had tried it. The result is the same with the original one. (The
>>> hack code is optimized)
>>
>> After hacking the assemble code, I got 2-3% performance improvement
>> for -O2. Here is the assemble change
>> Original code:
>>         push    {r4, r5, r6, r7, r8, r9, lr}
>>         .save {r4, r5, r6, r7, r8, r9, lr}
>>         mov     r6, r1
>>         .pad #196
>>         sub     sp, sp, #196
>>         cbz     r1, .L113
>>         ldr     r8, .L117
>>         ...
>> .L113:
>>         movs    r0, #1
>>         add     sp, sp, #196
>>         @ sp needed
>>         pop     {r4, r5, r6, r7, r8, r9, pc}
>>
>> After shrink-wrap:
>>         cbz     r1, .L1131
>>         push    {r4, r5, r6, r7, r8, r9, lr}
>>         .save {r4, r5, r6, r7, r8, r9, lr}
>>         mov     r6, r1
>>         .pad #196
>>         sub     sp, sp, #196
>>         ldr     r8, .L117
>>         ...
>> .L113:
>>         movs    r0, #1
>>         add     sp, sp, #196
>>         @ sp needed
>>         pop     {r4, r5, r6, r7, r8, r9, pc}
>> .L1131:
>>         movs    r0, #1
>>         bx      lr
>>
>> But simple hack for -O3 has ~1% regression. "code alignment" change
>> should be the root cause. To verify it, I add 6 NOPs after "bx lr".
>> With it, the size of block .L1131 is 16 Bytes. After this change, O3
>> will have 2-3% performance improvement.
>
> That's good then.  So modulo supposed alignment changes, your current
> shrink wrap patch causes no speed regressions and has the potential to
> show an improvement.
>
> Worth finishing and committing.  Shrinkwrap was a mess last time - we
> need to check that all of these bugs:
>  http://goo.gl/6fGg5
>
> are clear before upstreaming/backporting.

I will build a clean toolchain and verify them.

Thanks!
-Zhenqiang

_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to