I know this sounds like it might be better answered in gcc-help, but
if I am right this is a bug report.
I'm using gcc 4.5 branch, rev. 165881 (a week old), on x86-64 Linux.
This testcase is derived from a larger program. I have looked at the
assembly and was puzzled.
#include <set>
#include <stdio.h>
int main()
{
static const int array[] = { 1,2,3,4,5,6,7,8,9,10,6 };
std::set<int> the_set;
int count = 0;
for (unsigned i = 0; i < sizeof(array)/sizeof(*array); i++)
{
std::pair<std::set<int>::iterator, bool> result =
the_set.insert(array[i]);
if (result.second)
count++;
}
printf("%d unique items in array.\n", count);
return 0;
}
compiled using g++ -std=c++98 -Os this produces what looks to me as
very inefficient code.
Particularly this loop in main():
40076d: 89 d8 mov %ebx,%eax
40076f: 4c 89 e7 mov %r12,%rdi
400772: 48 8d 34 85 60 0a 40 lea 0x400a60(,%rax,4),%rsi
400779: 00
40077a: e8 b1 01 00 00 callq 400930 <std::set<int,
std::less<int>, std::allocator<int> >::insert(int const&)>
40077f: 48 89 04 24 mov %rax,(%rsp)
400783: 89 54 24 08 mov %edx,0x8(%rsp)
400787: 48 89 44 24 40 mov %rax,0x40(%rsp)
40078c: 48 8b 44 24 08 mov 0x8(%rsp),%rax
400791: 3c 01 cmp $0x1,%al
400793: 48 89 44 24 48 mov %rax,0x48(%rsp)
400798: 83 dd ff sbb $0xffffffffffffffff,%ebp
40079b: ff c3 inc %ebx
40079d: 83 fb 0b cmp $0xb,%ebx
4007a0: 75 cb jne 40076d <main+0x19>
And the uninlined set::insert():
0000000000400930 <std::set<int, std::less<int>, std::allocator<int>
>::insert(int const&)>:
400930: 48 83 ec 48 sub $0x48,%rsp
400934: e8 5f ff ff ff callq 400898
<std::_Rb_tree<int, int, std::_Identity<int>, std::less<int>,
std::allocator<int> >::_M_insert_unique(int const&)>
400939: 89 54 24 18 mov %edx,0x18(%rsp)
40093d: 8a 54 24 18 mov 0x18(%rsp),%dl
400941: 88 54 24 28 mov %dl,0x28(%rsp)
400945: 8b 54 24 28 mov 0x28(%rsp),%edx
400949: 48 83 c4 48 add $0x48,%rsp
40094d: c3 retq
40094e: 90 nop
40094f: 90 nop
In the larger program, I got almost these exact results using
"-fprofile-use -O3", but since I have replicated it using -Os and
without PGO maybe it will be easier to debug/optimize.
That zero-result stack shuffling, and the unused stack frames are strange.
Am I reading the code wrong? Should I be using a different version of
the compiler? Is this a known bug?
Please advise.