https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121679
Bug ID: 121679
Summary: Much better code at -O1 than at O2
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: rearnsha at gcc dot gnu.org
Target Milestone: ---
#include <random>
struct A
{
int a;
int b;
const char *s;
uint8_t array[64];
};
static A f1(int a, int b)
{
A obj {
.a = a + b,
.b = b * a,
.s = (a < b) ? "hello": "world",
.array = {0}
};
return obj;
}
int main(void)
{
std::mt19937 gen32;
A a = f1 (gen32(), gen32());
return 0;
}
gcc -std=c++23 <optimize> test.c -S
When compiled at -O1 this code is entirely optimized away to "return 0;", but
at -O2 a significant chunk remains.
The relevant pass seems to be dse1:
At -O1:
;; Function main (main, funcdef_no=3127, decl_uid=61567, cgraph_uid=803,
symbol_order=1346)
Deleted dead store: a = f1 (_4, _2); [return slot optimization]
Deleted trivially dead stmt: _4 = (int) _11;
Deleted dead store: _11 = std::mersenne_twister_engine<long unsigned int, 32,
624, 397, 31, 2567483615, 11, 4294967295, 7, 2636928640, 15, 4022730752, 18,
1812433253>::operator() (&gen32);
Deleted trivially dead stmt: _2 = (int) _9;
Deleted dead store: _9 = std::mersenne_twister_engine<long unsigned int, 32,
624, 397, 31, 2567483615, 11, 4294967295, 7, 2636928640, 15, 4022730752, 18,
1812433253>::operator() (&gen32);
Deleted dead store: std::mersenne_twister_engine<long unsigned int, 32, 624,
397, 31, 2567483615, 11, 4294967295, 7, 2636928640, 15, 4022730752, 18,
1812433253>::seed (&gen32, 5489);
Deleted dead store: MEM[(struct mersenne_twister_engine *)&gen32] ={v}
{CLOBBER};
But at -O2:
Deleted dead store: a.s = iftmp.3_20;
Deleted trivially dead PHI: iftmp.3_20 = PHI <"hello"(3), "world"(4)>
Deleted dead store: a.b = _19;
Deleted trivially dead stmt: _19 = _2 * _4;
Deleted dead store: a.a = _18;
Deleted trivially dead stmt: _18 = _2 + _4;
Deleted dead store: MEM <char[64]> [(struct A *)&a + 16B] = {};
It seems that the earlier inlining of the call to f1() is preventing later
optimization of the dead calls