On 22 November 2012 20:53, Zhenqiang Chen <zhenqiang.c...@linaro.org> wrote: > On 21 November 2012 09:20, Zhenqiang Chen <zhenqiang.c...@linaro.org> wrote: >> On 21 November 2012 03:26, Michael Hope <michael.h...@linaro.org> wrote: >>> On 20 November 2012 22:10, Zhenqiang Chen <zhenqiang.c...@linaro.org> wrote: >>>> Hi, >>>> >>>> I try ARM, MIPS, PowerPC and X86 on povray benchmark. No one can >>>> shrink-wrap function Ray_In_Bound. >>>> >>>> Here is: >>>> bool Ray_In_Bound (RAY *Ray, OBJECT *Bounding_Object) >>>> { >>>> ... >>>> for (Bound = Bounding_Object; Bound != NULL; Bound = Bound->Sibling) >>>> {...} >>>> return (true); >>>> } >>>> For ARM O2/O3, "Bound" is allocated to "r6" during ira. So there is copy >>>> >>>> r6 = r1 before >>>> testing Bound != NULL >>> >>> Could you hack the benchmark to make the early exit explicit and see >>> if that changes the result? That lets us know if improving shrink >>> wrap is worthwhile. >>> >>> Something like: >>> >>> bool Ray_In_Bound (RAY *Ray, OBJECT *Bounding_Object) >>> { >>> if (Bounding_Object == NULL) return true; >> >> I had tried it. The result is the same with the original one. (The >> hack code is optimized) > > After hacking the assemble code, I got 2-3% performance improvement > for -O2. Here is the assemble change > Original code: > push {r4, r5, r6, r7, r8, r9, lr} > .save {r4, r5, r6, r7, r8, r9, lr} > mov r6, r1 > .pad #196 > sub sp, sp, #196 > cbz r1, .L113 > ldr r8, .L117 > ... > .L113: > movs r0, #1 > add sp, sp, #196 > @ sp needed > pop {r4, r5, r6, r7, r8, r9, pc} > > After shrink-wrap: > cbz r1, .L1131 > push {r4, r5, r6, r7, r8, r9, lr} > .save {r4, r5, r6, r7, r8, r9, lr} > mov r6, r1 > .pad #196 > sub sp, sp, #196 > ldr r8, .L117 > ... > .L113: > movs r0, #1 > add sp, sp, #196 > @ sp needed > pop {r4, r5, r6, r7, r8, r9, pc} > .L1131: > movs r0, #1 > bx lr > > But simple hack for -O3 has ~1% regression. "code alignment" change > should be the root cause. To verify it, I add 6 NOPs after "bx lr". > With it, the size of block .L1131 is 16 Bytes. After this change, O3 > will have 2-3% performance improvement.
That's good then. So modulo supposed alignment changes, your current shrink wrap patch causes no speed regressions and has the potential to show an improvement. Worth finishing and committing. Shrinkwrap was a mess last time - we need to check that all of these bugs: http://goo.gl/6fGg5 are clear before upstreaming/backporting. -- Michael _______________________________________________ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain