Ok, it seems the improvement is quite dependent on the compiler version.
I did some more measurements and the results are quite ambiguous.
The tests all decode a 1080p 175M dnxhd file on an i7-4770K. The
following table shows the change in overall decoding performance (as
measured by perf stat averaged over 10 runs):
new | new without unlikely
--------------------------------------------
32bit gcc 4.9 | -9.4% | -11.2%
32bit gcc 5.4 | -8.4% | -0.9%
32bit gcc 6.1 | -4.9% | -2.6%
32bit clang 3.6 | -8.1% | -8.4%
64bit gcc 4.9 | +20.7% | +21.5%
64bit gcc 5.4 | +20.6% | +22.0%
64bit gcc 6.1 | +22.1% | +22.5%
64bit clang 3.6 | +13.3% | +13.2%
Full perf output from those runs is attached if anyone wants to see it.
--
Anton Khirnov
32/new_nounlikely/gcc54
Performance counter stats for '32/new_nounlikely/gcc54 -threads 1 -i out.mov
-f null -frames 1000 -v fatal -' (10 runs):
12028.489329 task-clock (msec) # 0.997 CPUs utilized
47 context-switches # 0.004 K/sec
15 cpu-migrations # 0.001 K/sec
227,375 page-faults # 0.019 M/sec
46,803,400,832 cycles # 3.879 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
97,411,737,312 instructions # 2.08 insns per cycle
10,809,507,723 branches # 895.930 M/sec
402,696,922 branch-misses # 3.73% of all branches
12.065776540 seconds time elapsed
( +- 0.30% )
32/new_nounlikely/gcc49
Performance counter stats for '32/new_nounlikely/gcc49 -threads 1 -i out.mov
-f null -frames 1000 -v fatal -' (10 runs):
12481.023731 task-clock (msec) # 1.000 CPUs utilized
50 context-switches # 0.004 K/sec
13 cpu-migrations # 0.001 K/sec
227,412 page-faults # 0.018 M/sec
48,564,231,164 cycles # 3.892 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
97,812,967,530 instructions # 2.01 insns per cycle
12,071,782,437 branches # 967.466 M/sec
402,854,408 branch-misses # 3.34% of all branches
12.478492178 seconds time elapsed
( +- 0.03% )
32/new_nounlikely/clang36
Performance counter stats for '32/new_nounlikely/clang36 -threads 1 -i out.mov
-f null -frames 1000 -v fatal -' (10 runs):
12815.982410 task-clock (msec) # 0.997 CPUs utilized
52 context-switches # 0.004 K/sec
15 cpu-migrations # 0.001 K/sec
227,339 page-faults # 0.018 M/sec
49,867,576,109 cycles # 3.881 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
104,793,898,107 instructions # 2.10 insns per cycle
13,215,236,459 branches # 1028.545 M/sec
367,222,477 branch-misses # 2.78% of all branches
12.849132285 seconds time elapsed
( +- 0.27% )
32/new_nounlikely/gcc61
Performance counter stats for '32/new_nounlikely/gcc61 -threads 1 -i out.mov
-f null -frames 1000 -v fatal -' (10 runs):
12287.806127 task-clock (msec) # 1.010 CPUs utilized
48 context-switches # 0.004 K/sec
15 cpu-migrations # 0.001 K/sec
227,339 page-faults # 0.019 M/sec
47,812,414,857 cycles # 3.929 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
97,019,067,015 instructions # 2.05 insns per cycle
10,808,732,579 branches # 888.112 M/sec
405,868,186 branch-misses # 3.75% of all branches
12.171179979 seconds time elapsed
( +- 0.22% )
32/new/gcc54
Performance counter stats for '32/new/gcc54 -threads 1 -i out.mov -f null
-frames 1000 -v fatal -' (10 runs):
13006.111334 task-clock (msec) # 0.996 CPUs utilized
109 context-switches # 0.008 K/sec
15 cpu-migrations # 0.001 K/sec
227,375 page-faults # 0.017 M/sec
50,606,704,528 cycles # 3.877 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
101,124,583,693 instructions # 1.99 insns per cycle
10,733,335,664 branches # 822.194 M/sec
409,887,262 branch-misses # 3.82% of all branches
13.055509559 seconds time elapsed
( +- 0.20% )
32/new/gcc49
Performance counter stats for '32/new/gcc49 -threads 1 -i out.mov -f null
-frames 1000 -v fatal -' (10 runs):
12226.157533 task-clock (msec) # 0.999 CPUs utilized
48 context-switches # 0.004 K/sec
14 cpu-migrations # 0.001 K/sec
227,414 page-faults # 0.019 M/sec
47,572,541,022 cycles # 3.888 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
97,931,253,798 instructions # 2.06 insns per cycle
12,071,326,705 branches # 986.513 M/sec
405,494,795 branch-misses # 3.36% of all branches
12.237069396 seconds time elapsed
( +- 0.03% )
32/new/clang36
Performance counter stats for '32/new/clang36 -threads 1 -i out.mov -f null
-frames 1000 -v fatal -' (10 runs):
12797.648433 task-clock (msec) # 0.999 CPUs utilized
48 context-switches # 0.004 K/sec
13 cpu-migrations # 0.001 K/sec
227,338 page-faults # 0.018 M/sec
49,796,249,373 cycles # 3.889 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
104,797,177,769 instructions # 2.10 insns per cycle
13,215,968,621 branches # 1032.091 M/sec
366,726,222 branch-misses # 2.77% of all branches
12.805774319 seconds time elapsed
( +- 0.05% )
32/new/gcc61
Performance counter stats for '32/new/gcc61 -threads 1 -i out.mov -f null
-frames 1000 -v fatal -' (10 runs):
12428.269191 task-clock (msec) # 0.997 CPUs utilized
49 context-switches # 0.004 K/sec
15 cpu-migrations # 0.001 K/sec
227,340 page-faults # 0.018 M/sec
48,358,969,133 cycles # 3.880 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
98,136,911,028 instructions # 2.02 insns per cycle
10,696,006,256 branches # 858.219 M/sec
408,904,237 branch-misses # 3.82% of all branches
12.463702050 seconds time elapsed
( +- 0.33% )
32/old/gcc54
Performance counter stats for '32/old/gcc54 -threads 1 -i out.mov -f null
-frames 1000 -v fatal -' (10 runs):
11954.919791 task-clock (msec) # 1.000 CPUs utilized
110 context-switches # 0.009 K/sec
15 cpu-migrations # 0.001 K/sec
227,373 page-faults # 0.019 M/sec
46,516,597,361 cycles # 3.892 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
89,252,297,095 instructions # 1.92 insns per cycle
8,961,771,445 branches # 749.819 M/sec
304,285,004 branch-misses # 3.40% of all branches
11.952728265 seconds time elapsed
( +- 0.02% )
32/old/gcc49
Performance counter stats for '32/old/gcc49 -threads 1 -i out.mov -f null
-frames 1000 -v fatal -' (10 runs):
11085.757702 task-clock (msec) # 1.000 CPUs utilized
73 context-switches # 0.007 K/sec
14 cpu-migrations # 0.001 K/sec
227,413 page-faults # 0.021 M/sec
43,134,838,159 cycles # 3.892 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
88,024,029,162 instructions # 2.04 insns per cycle
10,761,240,465 branches # 970.984 M/sec
304,643,015 branch-misses # 2.83% of all branches
11.083579198 seconds time elapsed
( +- 0.04% )
32/old/clang36
Performance counter stats for '32/old/clang36 -threads 1 -i out.mov -f null
-frames 1000 -v fatal -' (10 runs):
11768.320951 task-clock (msec) # 1.000 CPUs utilized
47 context-switches # 0.004 K/sec
14 cpu-migrations # 0.001 K/sec
227,340 page-faults # 0.019 M/sec
45,791,049,994 cycles # 3.891 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
91,156,768,393 instructions # 1.99 insns per cycle
9,259,527,722 branches # 786.744 M/sec
337,959,931 branch-misses # 3.65% of all branches
11.770035225 seconds time elapsed
( +- 0.04% )
32/old/gcc61
Performance counter stats for '32/old/gcc61 -threads 1 -i out.mov -f null
-frames 1000 -v fatal -' (10 runs):
11838.046498 task-clock (msec) # 0.999 CPUs utilized
51 context-switches # 0.004 K/sec
16 cpu-migrations # 0.001 K/sec
227,340 page-faults # 0.019 M/sec
46,062,387,288 cycles # 3.887 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
88,756,281,044 instructions # 1.92 insns per cycle
8,960,323,747 branches # 756.084 M/sec
304,404,505 branch-misses # 3.40% of all branches
11.851566764 seconds time elapsed
( +- 0.08% )
64/new_nounlikely/gcc54
Performance counter stats for '64/new_nounlikely/gcc54 -threads 1 -i out.mov
-f null -frames 1000 -v fatal -' (10 runs):
9582.563876 task-clock (msec) # 0.999 CPUs utilized
44 context-switches # 0.005 K/sec
12 cpu-migrations # 0.001 K/sec
4,048 page-faults # 0.422 K/sec
37,286,155,229 cycles # 3.886 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
81,622,804,382 instructions # 2.19 insns per cycle
10,525,334,915 branches # 1097.037 M/sec
398,803,380 branch-misses # 3.79% of all branches
9.594918002 seconds time elapsed
( +- 0.22% )
64/new_nounlikely/gcc49
Performance counter stats for '64/new_nounlikely/gcc49 -threads 1 -i out.mov
-f null -frames 1000 -v fatal -' (10 runs):
9635.749561 task-clock (msec) # 1.002 CPUs utilized
83 context-switches # 0.009 K/sec
13 cpu-migrations # 0.001 K/sec
4,055 page-faults # 0.422 K/sec
37,492,532,992 cycles # 3.897 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
81,441,853,787 instructions # 2.18 insns per cycle
10,517,380,925 branches # 1093.320 M/sec
405,506,331 branch-misses # 3.86% of all branches
9.620367022 seconds time elapsed
( +- 0.09% )
64/new_nounlikely/clang36
Performance counter stats for '64/new_nounlikely/clang36 -threads 1 -i out.mov
-f null -frames 1000 -v fatal -' (10 runs):
9798.931302 task-clock (msec) # 0.999 CPUs utilized
73 context-switches # 0.007 K/sec
15 cpu-migrations # 0.002 K/sec
3,948 page-faults # 0.403 K/sec
38,127,670,239 cycles # 3.888 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
79,562,090,611 instructions # 2.09 insns per cycle
11,069,571,686 branches # 1128.806 M/sec
437,816,175 branch-misses # 3.96% of all branches
9.807219620 seconds time elapsed
( +- 0.14% )
64/new_nounlikely/gcc61
Performance counter stats for '64/new_nounlikely/gcc61 -threads 1 -i out.mov
-f null -frames 1000 -v fatal -' (10 runs):
9533.279523 task-clock (msec) # 1.000 CPUs utilized
40 context-switches # 0.004 K/sec
14 cpu-migrations # 0.001 K/sec
4,050 page-faults # 0.425 K/sec
37,094,411,520 cycles # 3.891 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
81,100,107,128 instructions # 2.19 insns per cycle
10,525,323,092 branches # 1104.000 M/sec
398,625,074 branch-misses # 3.79% of all branches
9.534586166 seconds time elapsed
( +- 0.03% )
64/new/gcc54
Performance counter stats for '64/new/gcc54 -threads 1 -i out.mov -f null
-frames 1000 -v fatal -' (10 runs):
9733.050995 task-clock (msec) # 1.003 CPUs utilized
36 context-switches # 0.004 K/sec
14 cpu-migrations # 0.001 K/sec
4,046 page-faults # 0.417 K/sec
37,871,767,362 cycles # 3.903 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
81,818,962,213 instructions # 2.17 insns per cycle
10,525,464,124 branches # 1084.730 M/sec
400,972,010 branch-misses # 3.81% of all branches
9.704087918 seconds time elapsed
( +- 0.05% )
64/new/gcc49
Performance counter stats for '64/new/gcc49 -threads 1 -i out.mov -f null
-frames 1000 -v fatal -' (10 runs):
9670.004658 task-clock (msec) # 0.999 CPUs utilized
31 context-switches # 0.003 K/sec
16 cpu-migrations # 0.002 K/sec
4,052 page-faults # 0.419 K/sec
37,626,463,817 cycles # 3.886 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
81,325,740,768 instructions # 2.16 insns per cycle
10,517,188,832 branches # 1086.327 M/sec
402,289,456 branch-misses # 3.83% of all branches
9.681921377 seconds time elapsed
( +- 0.07% )
64/new/clang36
Performance counter stats for '64/new/clang36 -threads 1 -i out.mov -f null
-frames 1000 -v fatal -' (10 runs):
9784.510122 task-clock (msec) # 0.999 CPUs utilized
60 context-switches # 0.006 K/sec
15 cpu-migrations # 0.002 K/sec
3,949 page-faults # 0.403 K/sec
38,071,695,884 cycles # 3.888 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
79,561,608,177 instructions # 2.09 insns per cycle
11,069,485,210 branches # 1130.514 M/sec
437,634,009 branch-misses # 3.95% of all branches
9.792169837 seconds time elapsed
( +- 0.03% )
64/new/gcc61
Performance counter stats for '64/new/gcc61 -threads 1 -i out.mov -f null
-frames 1000 -v fatal -' (10 runs):
9547.000932 task-clock (msec) # 0.999 CPUs utilized
55 context-switches # 0.006 K/sec
15 cpu-migrations # 0.002 K/sec
4,050 page-faults # 0.424 K/sec
37,147,647,719 cycles # 3.887 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
81,723,264,608 instructions # 2.20 insns per cycle
10,525,926,307 branches # 1101.281 M/sec
399,042,939 branch-misses # 3.79% of all branches
9.558567796 seconds time elapsed
( +- 0.06% )
64/old/gcc54
Performance counter stats for '64/old/gcc54 -threads 1 -i out.mov -f null
-frames 1000 -v fatal -' (10 runs):
11701.057602 task-clock (msec) # 1.000 CPUs utilized
86 context-switches # 0.007 K/sec
16 cpu-migrations # 0.001 K/sec
4,049 page-faults # 0.346 K/sec
45,528,929,499 cycles # 3.890 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
83,344,806,490 instructions # 1.83 insns per cycle
8,862,891,327 branches # 757.275 M/sec
304,064,685 branch-misses # 3.43% of all branches
11.704448398 seconds time elapsed
( +- 0.02% )
64/old/gcc49
Performance counter stats for '64/old/gcc49 -threads 1 -i out.mov -f null
-frames 1000 -v fatal -' (10 runs):
11713.571021 task-clock (msec) # 1.002 CPUs utilized
49 context-switches # 0.004 K/sec
17 cpu-migrations # 0.001 K/sec
4,051 page-faults # 0.347 K/sec
45,578,035,547 cycles # 3.901 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
83,670,505,769 instructions # 1.84 insns per cycle
8,862,639,663 branches # 758.480 M/sec
304,943,141 branch-misses # 3.44% of all branches
11.685292368 seconds time elapsed
( +- 0.03% )
64/old/clang36
Performance counter stats for '64/old/clang36 -threads 1 -i out.mov -f null
-frames 1000 -v fatal -' (10 runs):
11065.920232 task-clock (msec) # 0.997 CPUs utilized
37 context-switches # 0.003 K/sec
14 cpu-migrations # 0.001 K/sec
3,947 page-faults # 0.356 K/sec
43,058,056,621 cycles # 3.880 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
80,714,260,277 instructions # 1.87 insns per cycle
8,683,175,937 branches # 782.490 M/sec
342,161,709 branch-misses # 3.94% of all branches
11.097386913 seconds time elapsed
( +- 0.08% )
64/old/gcc61
Performance counter stats for '64/old/gcc61 -threads 1 -i out.mov -f null
-frames 1000 -v fatal -' (10 runs):
11658.329412 task-clock (msec) # 0.998 CPUs utilized
38 context-switches # 0.003 K/sec
15 cpu-migrations # 0.001 K/sec
4,048 page-faults # 0.347 K/sec
45,363,173,986 cycles # 3.885 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
82,886,501,691 instructions # 1.82 insns per cycle
8,862,592,771 branches # 759.064 M/sec
301,606,151 branch-misses # 3.40% of all branches
11.676273258 seconds time elapsed
( +- 0.11% )
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel