Ok, it seems the improvement is quite dependent on the compiler version.
I did some more measurements and the results are quite ambiguous.
The tests all decode a 1080p 175M dnxhd file on an i7-4770K. The
following table shows the change in overall decoding performance (as
measured by perf stat averaged over 10 runs):

                     new   |  new without unlikely
--------------------------------------------
32bit gcc 4.9    |  -9.4%  |  -11.2%
32bit gcc 5.4    |  -8.4%  |   -0.9%
32bit gcc 6.1    |  -4.9%  |   -2.6%
32bit clang 3.6  |  -8.1%  |   -8.4%
64bit gcc 4.9    | +20.7%  |  +21.5%
64bit gcc 5.4    | +20.6%  |  +22.0%
64bit gcc 6.1    | +22.1%  |  +22.5%
64bit clang 3.6  | +13.3%  |  +13.2%

Full perf output from those runs is attached if anyone wants to see it.

-- 
Anton Khirnov
32/new_nounlikely/gcc54

 Performance counter stats for '32/new_nounlikely/gcc54 -threads 1 -i out.mov 
-f null -frames 1000 -v fatal -' (10 runs):

      12028.489329      task-clock (msec)         #    0.997 CPUs utilized
                47      context-switches          #    0.004 K/sec
                15      cpu-migrations            #    0.001 K/sec
           227,375      page-faults               #    0.019 M/sec
    46,803,400,832      cycles                    #    3.879 GHz
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
    97,411,737,312      instructions              #    2.08  insns per cycle
    10,809,507,723      branches                  #  895.930 M/sec
       402,696,922      branch-misses             #    3.73% of all branches

      12.065776540 seconds time elapsed                                         
 ( +-  0.30% )

32/new_nounlikely/gcc49

 Performance counter stats for '32/new_nounlikely/gcc49 -threads 1 -i out.mov 
-f null -frames 1000 -v fatal -' (10 runs):

      12481.023731      task-clock (msec)         #    1.000 CPUs utilized
                50      context-switches          #    0.004 K/sec
                13      cpu-migrations            #    0.001 K/sec
           227,412      page-faults               #    0.018 M/sec
    48,564,231,164      cycles                    #    3.892 GHz
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
    97,812,967,530      instructions              #    2.01  insns per cycle
    12,071,782,437      branches                  #  967.466 M/sec
       402,854,408      branch-misses             #    3.34% of all branches

      12.478492178 seconds time elapsed                                         
 ( +-  0.03% )

32/new_nounlikely/clang36

 Performance counter stats for '32/new_nounlikely/clang36 -threads 1 -i out.mov 
-f null -frames 1000 -v fatal -' (10 runs):

      12815.982410      task-clock (msec)         #    0.997 CPUs utilized
                52      context-switches          #    0.004 K/sec
                15      cpu-migrations            #    0.001 K/sec
           227,339      page-faults               #    0.018 M/sec
    49,867,576,109      cycles                    #    3.881 GHz
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
   104,793,898,107      instructions              #    2.10  insns per cycle
    13,215,236,459      branches                  # 1028.545 M/sec
       367,222,477      branch-misses             #    2.78% of all branches

      12.849132285 seconds time elapsed                                         
 ( +-  0.27% )

32/new_nounlikely/gcc61

 Performance counter stats for '32/new_nounlikely/gcc61 -threads 1 -i out.mov 
-f null -frames 1000 -v fatal -' (10 runs):

      12287.806127      task-clock (msec)         #    1.010 CPUs utilized
                48      context-switches          #    0.004 K/sec
                15      cpu-migrations            #    0.001 K/sec
           227,339      page-faults               #    0.019 M/sec
    47,812,414,857      cycles                    #    3.929 GHz
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
    97,019,067,015      instructions              #    2.05  insns per cycle
    10,808,732,579      branches                  #  888.112 M/sec
       405,868,186      branch-misses             #    3.75% of all branches

      12.171179979 seconds time elapsed                                         
 ( +-  0.22% )

32/new/gcc54

 Performance counter stats for '32/new/gcc54 -threads 1 -i out.mov -f null 
-frames 1000 -v fatal -' (10 runs):

      13006.111334      task-clock (msec)         #    0.996 CPUs utilized
               109      context-switches          #    0.008 K/sec
                15      cpu-migrations            #    0.001 K/sec
           227,375      page-faults               #    0.017 M/sec
    50,606,704,528      cycles                    #    3.877 GHz
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
   101,124,583,693      instructions              #    1.99  insns per cycle
    10,733,335,664      branches                  #  822.194 M/sec
       409,887,262      branch-misses             #    3.82% of all branches

      13.055509559 seconds time elapsed                                         
 ( +-  0.20% )

32/new/gcc49

 Performance counter stats for '32/new/gcc49 -threads 1 -i out.mov -f null 
-frames 1000 -v fatal -' (10 runs):

      12226.157533      task-clock (msec)         #    0.999 CPUs utilized
                48      context-switches          #    0.004 K/sec
                14      cpu-migrations            #    0.001 K/sec
           227,414      page-faults               #    0.019 M/sec
    47,572,541,022      cycles                    #    3.888 GHz
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
    97,931,253,798      instructions              #    2.06  insns per cycle
    12,071,326,705      branches                  #  986.513 M/sec
       405,494,795      branch-misses             #    3.36% of all branches

      12.237069396 seconds time elapsed                                         
 ( +-  0.03% )

32/new/clang36

 Performance counter stats for '32/new/clang36 -threads 1 -i out.mov -f null 
-frames 1000 -v fatal -' (10 runs):

      12797.648433      task-clock (msec)         #    0.999 CPUs utilized
                48      context-switches          #    0.004 K/sec
                13      cpu-migrations            #    0.001 K/sec
           227,338      page-faults               #    0.018 M/sec
    49,796,249,373      cycles                    #    3.889 GHz
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
   104,797,177,769      instructions              #    2.10  insns per cycle
    13,215,968,621      branches                  # 1032.091 M/sec
       366,726,222      branch-misses             #    2.77% of all branches

      12.805774319 seconds time elapsed                                         
 ( +-  0.05% )

32/new/gcc61

 Performance counter stats for '32/new/gcc61 -threads 1 -i out.mov -f null 
-frames 1000 -v fatal -' (10 runs):

      12428.269191      task-clock (msec)         #    0.997 CPUs utilized
                49      context-switches          #    0.004 K/sec
                15      cpu-migrations            #    0.001 K/sec
           227,340      page-faults               #    0.018 M/sec
    48,358,969,133      cycles                    #    3.880 GHz
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
    98,136,911,028      instructions              #    2.02  insns per cycle
    10,696,006,256      branches                  #  858.219 M/sec
       408,904,237      branch-misses             #    3.82% of all branches

      12.463702050 seconds time elapsed                                         
 ( +-  0.33% )

32/old/gcc54

 Performance counter stats for '32/old/gcc54 -threads 1 -i out.mov -f null 
-frames 1000 -v fatal -' (10 runs):

      11954.919791      task-clock (msec)         #    1.000 CPUs utilized
               110      context-switches          #    0.009 K/sec
                15      cpu-migrations            #    0.001 K/sec
           227,373      page-faults               #    0.019 M/sec
    46,516,597,361      cycles                    #    3.892 GHz
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
    89,252,297,095      instructions              #    1.92  insns per cycle
     8,961,771,445      branches                  #  749.819 M/sec
       304,285,004      branch-misses             #    3.40% of all branches

      11.952728265 seconds time elapsed                                         
 ( +-  0.02% )

32/old/gcc49

 Performance counter stats for '32/old/gcc49 -threads 1 -i out.mov -f null 
-frames 1000 -v fatal -' (10 runs):

      11085.757702      task-clock (msec)         #    1.000 CPUs utilized
                73      context-switches          #    0.007 K/sec
                14      cpu-migrations            #    0.001 K/sec
           227,413      page-faults               #    0.021 M/sec
    43,134,838,159      cycles                    #    3.892 GHz
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
    88,024,029,162      instructions              #    2.04  insns per cycle
    10,761,240,465      branches                  #  970.984 M/sec
       304,643,015      branch-misses             #    2.83% of all branches

      11.083579198 seconds time elapsed                                         
 ( +-  0.04% )

32/old/clang36

 Performance counter stats for '32/old/clang36 -threads 1 -i out.mov -f null 
-frames 1000 -v fatal -' (10 runs):

      11768.320951      task-clock (msec)         #    1.000 CPUs utilized
                47      context-switches          #    0.004 K/sec
                14      cpu-migrations            #    0.001 K/sec
           227,340      page-faults               #    0.019 M/sec
    45,791,049,994      cycles                    #    3.891 GHz
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
    91,156,768,393      instructions              #    1.99  insns per cycle
     9,259,527,722      branches                  #  786.744 M/sec
       337,959,931      branch-misses             #    3.65% of all branches

      11.770035225 seconds time elapsed                                         
 ( +-  0.04% )

32/old/gcc61

 Performance counter stats for '32/old/gcc61 -threads 1 -i out.mov -f null 
-frames 1000 -v fatal -' (10 runs):

      11838.046498      task-clock (msec)         #    0.999 CPUs utilized
                51      context-switches          #    0.004 K/sec
                16      cpu-migrations            #    0.001 K/sec
           227,340      page-faults               #    0.019 M/sec
    46,062,387,288      cycles                    #    3.887 GHz
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
    88,756,281,044      instructions              #    1.92  insns per cycle
     8,960,323,747      branches                  #  756.084 M/sec
       304,404,505      branch-misses             #    3.40% of all branches

      11.851566764 seconds time elapsed                                         
 ( +-  0.08% )

64/new_nounlikely/gcc54

 Performance counter stats for '64/new_nounlikely/gcc54 -threads 1 -i out.mov 
-f null -frames 1000 -v fatal -' (10 runs):

       9582.563876      task-clock (msec)         #    0.999 CPUs utilized
                44      context-switches          #    0.005 K/sec
                12      cpu-migrations            #    0.001 K/sec
             4,048      page-faults               #    0.422 K/sec
    37,286,155,229      cycles                    #    3.886 GHz
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
    81,622,804,382      instructions              #    2.19  insns per cycle
    10,525,334,915      branches                  # 1097.037 M/sec
       398,803,380      branch-misses             #    3.79% of all branches

       9.594918002 seconds time elapsed                                         
 ( +-  0.22% )

64/new_nounlikely/gcc49

 Performance counter stats for '64/new_nounlikely/gcc49 -threads 1 -i out.mov 
-f null -frames 1000 -v fatal -' (10 runs):

       9635.749561      task-clock (msec)         #    1.002 CPUs utilized
                83      context-switches          #    0.009 K/sec
                13      cpu-migrations            #    0.001 K/sec
             4,055      page-faults               #    0.422 K/sec
    37,492,532,992      cycles                    #    3.897 GHz
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
    81,441,853,787      instructions              #    2.18  insns per cycle
    10,517,380,925      branches                  # 1093.320 M/sec
       405,506,331      branch-misses             #    3.86% of all branches

       9.620367022 seconds time elapsed                                         
 ( +-  0.09% )

64/new_nounlikely/clang36

 Performance counter stats for '64/new_nounlikely/clang36 -threads 1 -i out.mov 
-f null -frames 1000 -v fatal -' (10 runs):

       9798.931302      task-clock (msec)         #    0.999 CPUs utilized
                73      context-switches          #    0.007 K/sec
                15      cpu-migrations            #    0.002 K/sec
             3,948      page-faults               #    0.403 K/sec
    38,127,670,239      cycles                    #    3.888 GHz
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
    79,562,090,611      instructions              #    2.09  insns per cycle
    11,069,571,686      branches                  # 1128.806 M/sec
       437,816,175      branch-misses             #    3.96% of all branches

       9.807219620 seconds time elapsed                                         
 ( +-  0.14% )

64/new_nounlikely/gcc61

 Performance counter stats for '64/new_nounlikely/gcc61 -threads 1 -i out.mov 
-f null -frames 1000 -v fatal -' (10 runs):

       9533.279523      task-clock (msec)         #    1.000 CPUs utilized
                40      context-switches          #    0.004 K/sec
                14      cpu-migrations            #    0.001 K/sec
             4,050      page-faults               #    0.425 K/sec
    37,094,411,520      cycles                    #    3.891 GHz
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
    81,100,107,128      instructions              #    2.19  insns per cycle
    10,525,323,092      branches                  # 1104.000 M/sec
       398,625,074      branch-misses             #    3.79% of all branches

       9.534586166 seconds time elapsed                                         
 ( +-  0.03% )

64/new/gcc54

 Performance counter stats for '64/new/gcc54 -threads 1 -i out.mov -f null 
-frames 1000 -v fatal -' (10 runs):

       9733.050995      task-clock (msec)         #    1.003 CPUs utilized
                36      context-switches          #    0.004 K/sec
                14      cpu-migrations            #    0.001 K/sec
             4,046      page-faults               #    0.417 K/sec
    37,871,767,362      cycles                    #    3.903 GHz
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
    81,818,962,213      instructions              #    2.17  insns per cycle
    10,525,464,124      branches                  # 1084.730 M/sec
       400,972,010      branch-misses             #    3.81% of all branches

       9.704087918 seconds time elapsed                                         
 ( +-  0.05% )

64/new/gcc49

 Performance counter stats for '64/new/gcc49 -threads 1 -i out.mov -f null 
-frames 1000 -v fatal -' (10 runs):

       9670.004658      task-clock (msec)         #    0.999 CPUs utilized
                31      context-switches          #    0.003 K/sec
                16      cpu-migrations            #    0.002 K/sec
             4,052      page-faults               #    0.419 K/sec
    37,626,463,817      cycles                    #    3.886 GHz
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
    81,325,740,768      instructions              #    2.16  insns per cycle
    10,517,188,832      branches                  # 1086.327 M/sec
       402,289,456      branch-misses             #    3.83% of all branches

       9.681921377 seconds time elapsed                                         
 ( +-  0.07% )

64/new/clang36

 Performance counter stats for '64/new/clang36 -threads 1 -i out.mov -f null 
-frames 1000 -v fatal -' (10 runs):

       9784.510122      task-clock (msec)         #    0.999 CPUs utilized
                60      context-switches          #    0.006 K/sec
                15      cpu-migrations            #    0.002 K/sec
             3,949      page-faults               #    0.403 K/sec
    38,071,695,884      cycles                    #    3.888 GHz
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
    79,561,608,177      instructions              #    2.09  insns per cycle
    11,069,485,210      branches                  # 1130.514 M/sec
       437,634,009      branch-misses             #    3.95% of all branches

       9.792169837 seconds time elapsed                                         
 ( +-  0.03% )

64/new/gcc61

 Performance counter stats for '64/new/gcc61 -threads 1 -i out.mov -f null 
-frames 1000 -v fatal -' (10 runs):

       9547.000932      task-clock (msec)         #    0.999 CPUs utilized
                55      context-switches          #    0.006 K/sec
                15      cpu-migrations            #    0.002 K/sec
             4,050      page-faults               #    0.424 K/sec
    37,147,647,719      cycles                    #    3.887 GHz
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
    81,723,264,608      instructions              #    2.20  insns per cycle
    10,525,926,307      branches                  # 1101.281 M/sec
       399,042,939      branch-misses             #    3.79% of all branches

       9.558567796 seconds time elapsed                                         
 ( +-  0.06% )

64/old/gcc54

 Performance counter stats for '64/old/gcc54 -threads 1 -i out.mov -f null 
-frames 1000 -v fatal -' (10 runs):

      11701.057602      task-clock (msec)         #    1.000 CPUs utilized
                86      context-switches          #    0.007 K/sec
                16      cpu-migrations            #    0.001 K/sec
             4,049      page-faults               #    0.346 K/sec
    45,528,929,499      cycles                    #    3.890 GHz
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
    83,344,806,490      instructions              #    1.83  insns per cycle
     8,862,891,327      branches                  #  757.275 M/sec
       304,064,685      branch-misses             #    3.43% of all branches

      11.704448398 seconds time elapsed                                         
 ( +-  0.02% )

64/old/gcc49

 Performance counter stats for '64/old/gcc49 -threads 1 -i out.mov -f null 
-frames 1000 -v fatal -' (10 runs):

      11713.571021      task-clock (msec)         #    1.002 CPUs utilized
                49      context-switches          #    0.004 K/sec
                17      cpu-migrations            #    0.001 K/sec
             4,051      page-faults               #    0.347 K/sec
    45,578,035,547      cycles                    #    3.901 GHz
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
    83,670,505,769      instructions              #    1.84  insns per cycle
     8,862,639,663      branches                  #  758.480 M/sec
       304,943,141      branch-misses             #    3.44% of all branches

      11.685292368 seconds time elapsed                                         
 ( +-  0.03% )

64/old/clang36

 Performance counter stats for '64/old/clang36 -threads 1 -i out.mov -f null 
-frames 1000 -v fatal -' (10 runs):

      11065.920232      task-clock (msec)         #    0.997 CPUs utilized
                37      context-switches          #    0.003 K/sec
                14      cpu-migrations            #    0.001 K/sec
             3,947      page-faults               #    0.356 K/sec
    43,058,056,621      cycles                    #    3.880 GHz
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
    80,714,260,277      instructions              #    1.87  insns per cycle
     8,683,175,937      branches                  #  782.490 M/sec
       342,161,709      branch-misses             #    3.94% of all branches

      11.097386913 seconds time elapsed                                         
 ( +-  0.08% )

64/old/gcc61

 Performance counter stats for '64/old/gcc61 -threads 1 -i out.mov -f null 
-frames 1000 -v fatal -' (10 runs):

      11658.329412      task-clock (msec)         #    0.998 CPUs utilized
                38      context-switches          #    0.003 K/sec
                15      cpu-migrations            #    0.001 K/sec
             4,048      page-faults               #    0.347 K/sec
    45,363,173,986      cycles                    #    3.885 GHz
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
    82,886,501,691      instructions              #    1.82  insns per cycle
     8,862,592,771      branches                  #  759.064 M/sec
       301,606,151      branch-misses             #    3.40% of all branches

      11.676273258 seconds time elapsed                                         
 ( +-  0.11% )
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to