On 23 November 2012 04:25, Michael Hope <michael.h...@linaro.org> wrote: > On 22 November 2012 20:53, Zhenqiang Chen <zhenqiang.c...@linaro.org> wrote: >> On 21 November 2012 09:20, Zhenqiang Chen <zhenqiang.c...@linaro.org> wrote: >>> On 21 November 2012 03:26, Michael Hope <michael.h...@linaro.org> wrote: >>>> On 20 November 2012 22:10, Zhenqiang Chen <zhenqiang.c...@linaro.org> >>>> wrote: >>>>> Hi, >>>>> >>>>> I try ARM, MIPS, PowerPC and X86 on povray benchmark. No one can >>>>> shrink-wrap function Ray_In_Bound. >>>>> >>>>> Here is: >>>>> bool Ray_In_Bound (RAY *Ray, OBJECT *Bounding_Object) >>>>> { >>>>> ... >>>>> for (Bound = Bounding_Object; Bound != NULL; Bound = Bound->Sibling) >>>>> {...} >>>>> return (true); >>>>> } >>>>> For ARM O2/O3, "Bound" is allocated to "r6" during ira. So there is copy >>>>> >>>>> r6 = r1 before >>>>> testing Bound != NULL >>>> >>>> Could you hack the benchmark to make the early exit explicit and see >>>> if that changes the result? That lets us know if improving shrink >>>> wrap is worthwhile. >>>> >>>> Something like: >>>> >>>> bool Ray_In_Bound (RAY *Ray, OBJECT *Bounding_Object) >>>> { >>>> if (Bounding_Object == NULL) return true; >>> >>> I had tried it. The result is the same with the original one. (The >>> hack code is optimized) >> >> After hacking the assemble code, I got 2-3% performance improvement >> for -O2. Here is the assemble change >> Original code: >> push {r4, r5, r6, r7, r8, r9, lr} >> .save {r4, r5, r6, r7, r8, r9, lr} >> mov r6, r1 >> .pad #196 >> sub sp, sp, #196 >> cbz r1, .L113 >> ldr r8, .L117 >> ... >> .L113: >> movs r0, #1 >> add sp, sp, #196 >> @ sp needed >> pop {r4, r5, r6, r7, r8, r9, pc} >> >> After shrink-wrap: >> cbz r1, .L1131 >> push {r4, r5, r6, r7, r8, r9, lr} >> .save {r4, r5, r6, r7, r8, r9, lr} >> mov r6, r1 >> .pad #196 >> sub sp, sp, #196 >> ldr r8, .L117 >> ... >> .L113: >> movs r0, #1 >> add sp, sp, #196 >> @ sp needed >> pop {r4, r5, r6, r7, r8, r9, pc} >> .L1131: >> movs r0, #1 >> bx lr >> >> But simple hack for -O3 has ~1% regression. "code alignment" change >> should be the root cause. To verify it, I add 6 NOPs after "bx lr". >> With it, the size of block .L1131 is 16 Bytes. After this change, O3 >> will have 2-3% performance improvement. > > That's good then. So modulo supposed alignment changes, your current > shrink wrap patch causes no speed regressions and has the potential to > show an improvement. > > Worth finishing and committing. Shrinkwrap was a mess last time - we > need to check that all of these bugs: > http://goo.gl/6fGg5 > > are clear before upstreaming/backporting.
I will build a clean toolchain and verify them. Thanks! -Zhenqiang _______________________________________________ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain