On Tue, Mar 26, 2013 at 01:23:58AM +0400, Dinar Temirbulatov wrote:
> Hi,
> We noticed some performance gains if we are not using jump over some
> simple switch statements. Here is the idea: Check whether the switch
> statement can be expanded with conditional instructions. In that case
> jump tables should be avoided since some branch instructions can be
> eliminated in further passes (replaced by conditional execution).
>
> For example:
> switch (i)
> {
> case 1: sum += 1;
> case 2: sum += 3;
> case 3: sum += 5;
> case 4: sum += 10;
> }
>
> Using jump tables the following code will be generated (ARM assembly):
>
> ldrcc pc, [pc, r0, lsl #2]
> b .L5
> .L0:
> .word L1
> .word L2
> .word L3
> .word L4
>
> .L1:
> add r3, #1
> .L2:
> add r3, #4
> .L3:
> add r3, #5
> .L4:
> add r3, #10
> .L5
>
> Although this code has a constant complexity it can be improved by the
> conditional execution to avoid implicit branching:
>
> cmp r0,1
> addeq r3, #1
> cmp r0,2
> addeq r3, #4
> cmp r0,3
> addeq r3, #5
> cmp r0,4
> addeq r3, #10
>
> Although the assembly below requires more assembly instructions to be
> executed, it doesn't violate the CPU pipeline (since no branching is
> performed).
>
How simple are other expansions? You can rewrite this as
a={1,4,5,10}
sum += a[i]
and in similar cases it is posible to set a[i]=0 to simulate nop.
In this example is also second optimization possible, jump table is not
neccessary and address can be computed directly.
> The original version of patch for was developed by Alexey Kravets. I
> measured some performance improvements/regressions using spec 2000 int
> benchmark on Samsumg's exynos 5250. Here is the result:
>
> before:
> Base Base Base Peak
> Peak Peak
> Benchmarks Ref Time Run Time Ratio Ref Time Run Time Ratio
> ------------ -------- -------- -------- --------
> -------- --------
> 164.gzip 1400 287 487* 1400 288 485*
> 175.vpr 1400 376 373* 1400 374 374*
> 176.gcc 1100 121 912* 1100 118 933*
> 181.mcf 1800 242 743* 1800 251 718*
> 186.crafty 1000 159 628* 1000 165 608*
> 197.parser 1800 347 518* 1800 329 547*
> 252.eon 1300 960 135* 1300 960 135*
> 253.perlbmk 1800 214 842* 1800 212 848*
> 254.gap 1100 138 797* 1100 136 806*
> 255.vortex 1900 253 750* 1900 255 744*
> 256.bzip2 1500 237 632* 1500 230 653*
> 300.twolf X X
> SPECint_base2000 561
> SPECint2000 563
>
> After:
> 164.gzip 1400 286 490 * 1400 288 486 *
> 175.vpr 1400 213 656 * 1400 215 650 *
> 176.gcc 1100 119 923 * 1100 118 933 *
> 181.mcf 1800 247 730 * 1800 251 717 *
> 186.crafty 1000 145 688 * 1000 150 664 *
> 197.parser 1800 296 608 * 1800 275 654 *
> 252.eon X X
> 253.perlbmk 1800 206 872 * 1800 211 853 *
> 254.gap 1100 133 825 * 1100 131 838 *
> 255.vortex 1900 241 789 * 1900 239 797 *
> 256.bzip2 1500 235 638 * 1500 226 663 *
> 300.twolf X X
>
> The error in 252.eon was due to incorrect setup. Also "if (count >
> 3*PARAM_VALUE (PARAM_SWITCH_JUMP_TABLES_BB_OPS_LIMIT))" does not look
> correct, and probably it is better to move this code in the earlier
> stage just before the gimple expand and keep preference expand state
> (jump-tables or not) for every switch statement to avoid dealing with
> the RTL altogether.
>
> thanks, Dinar.