http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46228
--- Comment #6 from Zeev Tarantov <zeev.tarantov at gmail dot com> 2010-10-29 23:44:49 UTC --- Setting -finline-limit high didn't produce different code. This function: 4007f8: 48 8b 77 10 mov 0x10(%rdi),%rsi 4007fc: e9 c5 ff ff ff jmpq 4007c6 <std::_Rb_tree<int, int, std::_Identity<int>, std::less<int>, std::allocator<int> >::_M_erase(std::_Rb_tree_node<int>*)> 400801: 90 nop is called 100% of the runs of the program. The code is 10 bytes for the function and 5 bytes to call it, altogether 15 bytes. Inlined it would be 9 bytes. I don't count debug data for extra symbol and potential slowdown of the call to code that might not be in cache. If this decision is arbitrary and won't be changed, too bad. But please someone explain the purpose of this code: 400939: 89 54 24 18 mov %edx,0x18(%rsp) 40093d: 8a 54 24 18 mov 0x18(%rsp),%dl 400941: 88 54 24 28 mov %dl,0x28(%rsp) 400945: 8b 54 24 28 mov 0x28(%rsp),%edx 400949: 48 83 c4 48 add $0x48,%rsp 40094d: c3 retq The writes to stack slots that are about to be wiped. The net result of leaving "%edx & 0xff" in %edx using 16 bytes of code. The 72 bytes of stack that are allocated and unused. How is that in any way good? And the pair variable in main that is also saved to the stack without ever being read later. Why doesn't the compiler eliminate writes to memory that is never read from, and then eliminate the stack slot itself? > Oh this is not really a regression Is it your opinion that the code produced by gcc 4.5.1 is as good as the code produced by gcc 4.4.5 (and clang 2.8)? section size .text 952 .text 856