https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53533
Markus Trippelsdorf <trippels at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Last reconfirmed|2012-05-31 00:00:00 |2015-5-3 CC| |trippels at gcc dot gnu.org --- Comment #26 from Markus Trippelsdorf <trippels at gcc dot gnu.org> --- For gcc-5 and gcc-6 there is an additional 50% slowdown: % g++ -O3 loop_unroll.ii -o loop_unroll % time ./loop_unroll 10000 ./loop_unroll 10000 test description absolute operations ratio with number time per second test0 0 "int32_t for loop unroll 1" 0.14 sec 552.30 M 1.00 1 "int32_t for loop unroll 2" 0.11 sec 699.49 M 0.79 2 "int32_t for loop unroll 3" 0.14 sec 566.56 M 0.97 3 "int32_t for loop unroll 4" 0.15 sec 532.87 M 1.04 4 "int32_t for loop unroll 5" 0.10 sec 784.70 M 0.70 5 "int32_t for loop unroll 6" 0.09 sec 887.12 M 0.62 6 "int32_t for loop unroll 7" 0.09 sec 913.50 M 0.60 7 "int32_t for loop unroll 8" 0.08 sec 986.45 M 0.56 8 "int32_t for loop unroll 9" 0.23 sec 346.06 M 1.60 9 "int32_t for loop unroll 10" 0.08 sec 1040.06 M 0.53 10 "int32_t for loop unroll 11" 0.23 sec 348.02 M 1.59 11 "int32_t for loop unroll 12" 0.23 sec 353.38 M 1.56 12 "int32_t for loop unroll 13" 0.24 sec 338.32 M 1.63 13 "int32_t for loop unroll 14" 0.24 sec 332.32 M 1.66 14 "int32_t for loop unroll 15" 0.25 sec 321.15 M 1.72 15 "int32_t for loop unroll 16" 0.25 sec 318.23 M 1.74 16 "int32_t for loop unroll 17" 0.24 sec 329.43 M 1.68 17 "int32_t for loop unroll 18" 0.25 sec 321.34 M 1.72 18 "int32_t for loop unroll 19" 0.25 sec 314.53 M 1.76 19 "int32_t for loop unroll 20" 0.25 sec 325.33 M 1.70 20 "int32_t for loop unroll 21" 0.25 sec 323.67 M 1.71 21 "int32_t for loop unroll 22" 0.25 sec 316.85 M 1.74 22 "int32_t for loop unroll 23" 0.25 sec 323.51 M 1.71 23 "int32_t for loop unroll 24" 0.06 sec 1257.94 M 0.44 24 "int32_t for loop unroll 25" 0.24 sec 327.77 M 1.69 25 "int32_t for loop unroll 26" 0.06 sec 1310.44 M 0.42 26 "int32_t for loop unroll 27" 0.07 sec 1072.85 M 0.51 27 "int32_t for loop unroll 28" 0.28 sec 283.44 M 1.95 28 "int32_t for loop unroll 29" 0.30 sec 267.96 M 2.06 29 "int32_t for loop unroll 30" 0.31 sec 258.88 M 2.13 30 "int32_t for loop unroll 31" 0.06 sec 1337.64 M 0.41 31 "int32_t for loop unroll 32" 0.06 sec 1315.10 M 0.42 Total absolute time for int32_t for loop unrolling: 5.85 sec ... ./loop_unroll 10000 41.43s user 0.00s system 100% cpu 41.426 total ============================================================================== % /usr/x86_64-pc-linux-gnu/gcc-bin/4.9.2/g++ -O3 loop_unroll.ii -o loop_unroll % time ./loop_unroll 10000 ./loop_unroll 10000 test description absolute operations ratio with number time per second test0 0 "int32_t for loop unroll 1" 0.14 sec 582.13 M 1.00 1 "int32_t for loop unroll 2" 0.13 sec 625.41 M 0.93 2 "int32_t for loop unroll 3" 0.13 sec 635.76 M 0.92 3 "int32_t for loop unroll 4" 0.13 sec 625.41 M 0.93 4 "int32_t for loop unroll 5" 0.12 sec 640.96 M 0.91 5 "int32_t for loop unroll 6" 0.09 sec 888.11 M 0.66 6 "int32_t for loop unroll 7" 0.09 sec 900.10 M 0.65 7 "int32_t for loop unroll 8" 0.10 sec 832.20 M 0.70 8 "int32_t for loop unroll 9" 0.10 sec 834.22 M 0.70 9 "int32_t for loop unroll 10" 0.09 sec 902.04 M 0.65 10 "int32_t for loop unroll 11" 0.10 sec 805.15 M 0.72 11 "int32_t for loop unroll 12" 0.10 sec 823.27 M 0.71 12 "int32_t for loop unroll 13" 0.09 sec 860.51 M 0.68 13 "int32_t for loop unroll 14" 0.11 sec 753.59 M 0.77 14 "int32_t for loop unroll 15" 0.10 sec 781.96 M 0.74 15 "int32_t for loop unroll 16" 0.09 sec 858.76 M 0.68 16 "int32_t for loop unroll 17" 0.09 sec 846.91 M 0.69 17 "int32_t for loop unroll 18" 0.10 sec 783.19 M 0.74 18 "int32_t for loop unroll 19" 0.10 sec 794.81 M 0.73 19 "int32_t for loop unroll 20" 0.10 sec 806.70 M 0.72 20 "int32_t for loop unroll 21" 0.10 sec 823.82 M 0.71 21 "int32_t for loop unroll 22" 0.09 sec 851.74 M 0.68 22 "int32_t for loop unroll 23" 0.10 sec 792.87 M 0.73 23 "int32_t for loop unroll 24" 0.10 sec 809.32 M 0.72 24 "int32_t for loop unroll 25" 0.10 sec 832.18 M 0.70 25 "int32_t for loop unroll 26" 0.10 sec 781.11 M 0.75 26 "int32_t for loop unroll 27" 0.10 sec 792.40 M 0.73 27 "int32_t for loop unroll 28" 0.10 sec 817.22 M 0.71 28 "int32_t for loop unroll 29" 0.10 sec 826.40 M 0.70 29 "int32_t for loop unroll 30" 0.10 sec 803.83 M 0.72 30 "int32_t for loop unroll 31" 0.10 sec 803.48 M 0.72 31 "int32_t for loop unroll 32" 0.10 sec 796.88 M 0.73 Total absolute time for int32_t for loop unrolling: 3.28 sec ... ./loop_unroll 10000 22.75s user 0.00s system 100% cpu 22.746 total clang: ./loop_unroll 10000 12.93s user 0.00s system 100% cpu 12.933 total icpc (5* faster than gcc-5): ./loop_unroll 10000 8.38s user 0.00s system 99% cpu 8.382 total