https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84327
--- Comment #1 from xyzdr4gon333 at googlemail dot com --- This bug becomes more important for the actual real-life example which becomes slower at -O2 compared to -O1! Actually in the earlier attached file you only have to replace the `interleaveZeros` function with this one: unsigned int interleaveTwoZeros( unsigned int n ) { n&= 0x000003ff; n = (n ^ (n << 16)) & 0xFF0000FF; n = (n ^ (n << 8)) & 0x0300F00F; n = (n ^ (n << 4)) & 0x030C30C3; n = (n ^ (n << 2)) & 0x09249249; return n; } I.e. the only difference are slightly different constants, nothing else! The timings: 1234567890 iterations took 19.151s and resulted in 806157809 -O0 1234567890 iterations took 19.1547s and resulted in 1772082360 -O1 1234567890 iterations took 5.69619s and resulted in 2085417644 -O2 1234567890 iterations took 6.21504s and resulted in 32256352 -O3 1234567890 iterations took 6.14414s and resulted in 357018037 Not sure if this is worth another bug. Can reproduce this for the following compiler versions: for GPP in g++-4.9 g++-5 g++-6 g++-7 g++-8; do $GPP --version | head -1 for flag in ' ' -O0 -O1 -O2 -O3; do echo -n "$flag " $GPP $flag -std=c++11 optimizeFlags.cpp && ./a.out done done g++-4.9 (Debian 4.9.4-2) 4.9.4 1234567890 iterations took 19.1979s and resulted in 1918993912 -O0 1234567890 iterations took 19.1785s and resulted in 710267642 -O1 1234567890 iterations took 5.6609s and resulted in 1898524753 -O2 1234567890 iterations took 5.71375s and resulted in 1117037030 -O3 1234567890 iterations took 5.67933s and resulted in 1451088646 g++-5 (Debian 5.5.0-8) 5.5.0 20171010 1234567890 iterations took 19.2387s and resulted in 999898210 -O0 1234567890 iterations took 19.1464s and resulted in 1358121256 -O1 1234567890 iterations took 5.64181s and resulted in 642760018 -O2 1234567890 iterations took 5.65094s and resulted in 191105767 -O3 1234567890 iterations took 5.68849s and resulted in 1555980094 g++-6 (Debian 6.4.0-12) 6.4.0 20180123 1234567890 iterations took 19.1786s and resulted in 1613186065 -O0 1234567890 iterations took 19.2001s and resulted in 424276129 -O1 1234567890 iterations took 5.73263s and resulted in 1828427433 -O2 1234567890 iterations took 6.16005s and resulted in 814826690 -O3 1234567890 iterations took 6.1438s and resulted in 867162058 g++-7 (Debian 7.3.0-3) 7.3.0 1234567890 iterations took 19.1302s and resulted in 1147954921 -O0 1234567890 iterations took 19.1694s and resulted in 734785107 -O1 1234567890 iterations took 5.72652s and resulted in 1133709951 -O2 1234567890 iterations took 6.15633s and resulted in 352136223 -O3 1234567890 iterations took 6.14089s and resulted in 1468150013 g++-8 (Debian 8-20180207-2) 8.0.1 20180207 (experimental) [trunk revision 257435] 1234567890 iterations took 19.1278s and resulted in 694826541 -O0 1234567890 iterations took 19.1454s and resulted in 249938642 -O1 1234567890 iterations took 5.72959s and resulted in 365780913 -O2 1234567890 iterations took 6.20064s and resulted in 2033700921 -O3 1234567890 iterations took 6.12829s and resulted in 1244532281 => seems like this is somehow a regression bug since g++ 6! Actually a mix of -O1 with the additional O2-flags seems to work to reproduce the weird slowdown! g++ -O1 "${O2Flags[@]}" -std=c++11 optimizeFlags.cpp && ./a.out => 6.16161s Actually by bisecting the additional O2-flags this can be traced down to -finline-small-functions ... I will open another bug for this.