https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121679

            Bug ID: 121679
           Summary: Much better code at -O1 than at O2
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rearnsha at gcc dot gnu.org
  Target Milestone: ---

#include <random>

struct A
{
    int a;
    int b;
    const char *s;
    uint8_t array[64];
};

static A  f1(int a, int b)
{
    A obj {
        .a = a + b,
        .b = b * a,
        .s = (a < b) ? "hello": "world",
        .array = {0}
    };
    return obj;
}

int main(void)
{
    std::mt19937 gen32;
    A a = f1 (gen32(), gen32());
    return 0;
}

gcc -std=c++23 <optimize> test.c -S

When compiled at -O1 this code is entirely optimized away to "return 0;", but
at -O2 a significant chunk remains.

The relevant pass seems to be dse1:

At -O1:
;; Function main (main, funcdef_no=3127, decl_uid=61567, cgraph_uid=803,
symbol_order=1346)

  Deleted dead store: a = f1 (_4, _2); [return slot optimization]

  Deleted trivially dead stmt: _4 = (int) _11;

  Deleted dead store: _11 = std::mersenne_twister_engine<long unsigned int, 32,
624, 397, 31, 2567483615, 11, 4294967295, 7, 2636928640, 15, 4022730752, 18,
1812433253>::operator() (&gen32);

  Deleted trivially dead stmt: _2 = (int) _9;

  Deleted dead store: _9 = std::mersenne_twister_engine<long unsigned int, 32,
624, 397, 31, 2567483615, 11, 4294967295, 7, 2636928640, 15, 4022730752, 18,
1812433253>::operator() (&gen32);

  Deleted dead store: std::mersenne_twister_engine<long unsigned int, 32, 624,
397, 31, 2567483615, 11, 4294967295, 7, 2636928640, 15, 4022730752, 18,
1812433253>::seed (&gen32, 5489);

  Deleted dead store: MEM[(struct mersenne_twister_engine *)&gen32] ={v}
{CLOBBER};

But at -O2:
  Deleted dead store: a.s = iftmp.3_20;

  Deleted trivially dead PHI: iftmp.3_20 = PHI <"hello"(3), "world"(4)>

  Deleted dead store: a.b = _19;

  Deleted trivially dead stmt: _19 = _2 * _4;

  Deleted dead store: a.a = _18;

  Deleted trivially dead stmt: _18 = _2 + _4;

  Deleted dead store: MEM <char[64]> [(struct A *)&a + 16B] = {};

It seems that the earlier inlining of the call to f1() is preventing later
optimization of the dead calls

Reply via email to