https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85762

            Bug ID: 85762
           Summary: [8/9 Regression] range-v3 abstraction overhead not
                    optimized away
           Product: gcc
           Version: 9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: kristerw at gcc dot gnu.org
  Target Milestone: ---

Created attachment 44124
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44124&action=edit
preprocessed source code for run_range()

GCC 8 is less aggressive than earlier versions when eliminating abstraction
overhead in the range-v3 library, which can be seen with the function

  #include <vector>
  #include <range/v3/all.hpp>

  long run_range(std::vector<int> const &lengths, long to_find)
  {
    auto const found_index = ranges::distance(lengths
            | ranges::view::transform(ranges::convert_to<long>{})
            | ranges::view::partial_sum()
            | ranges::view::take_while([=](auto const i) {
                  return !(to_find < i);
              }));
    return found_index;
  }


GCC 7 compiled the loop to

  <bb 5> [10.87%]:
  # it$_M_current_41 = PHI <_6(4), _27(8)>
  # it$16_26 = PHI <it$16_24(4), _31(8)>
  _53 = to_find_2(D) < it$16_26;

  <bb 6> [100.00%]:
  # it$_M_current_23 = PHI <it$_M_current_41(5), _27(7)>
  _20 = _7 == it$_M_current_23;
  _5 = _20 | _53;
  if (_5 != 0)
    goto <bb 9>; [7.36%]
  else
    goto <bb 7>; [92.64%]

  <bb 7> [92.60%]:
  _27 = it$_M_current_23 + 4;
  if (_7 != _27)
    goto <bb 8>; [3.75%]
  else
    goto <bb 6>; [96.25%]

  <bb 8> [3.47%]:
  _29 = MEM[(const int &)it$_M_current_23 + 4];
  _30 = (long int) _29;
  _31 = it$16_26 + _30;
  goto <bb 5>; [100.00%]

  <bb 9> [7.36%]:
  _33 = (long int) it$_M_current_23;
  _34 = (long int) _6;
  _35 = _33 - _34;
  _36 = _35 /[ex] 4;
  return _36;

while the loop compiled by GCC 8 updates some structures in each iteration

  <bb 5> [local count: 1478210893]:
  # it_47 = PHI <SR.352_183(4), _64(8)>
  # it$16$sum__115 = PHI <SR.353_184(4), _67(8)>
  _42 = to_find_2(D) < it$16$sum__115;

  <bb 6> [local count: 1651554780]:
  # it_30 = PHI <it_47(5), _64(7)>
  _46 = it_30 == SR.355_137;
  _40 = _42 | _46;
  if (_40 != 0)
    goto <bb 9>; [65.00%]
  else
    goto <bb 7>; [35.00%]

  <bb 7> [local count: 577812955]:
  SR.80_62 = MEM[(const struct __normal_iterator &)SR.354_185 + 24];
  MEM[(struct adaptor_cursor *)&pos] = SR.80_62;
  MEM[(struct box *)&D.417725].value = pos;
  SR.396_209 = MEM[(struct adaptor_cursor *)&D.417725];
  _64 = it_30 + 4;
  if (_64 != SR.396_209)
    goto <bb 8>; [70.00%]
  else
    goto <bb 6>; [30.00%]

  <bb 8> [local count: 404469068]:
  _65 = MEM[(const int &)it_30 + 4];
  _66 = (long int) _65;
  _67 = _66 + it$16$sum__115;
  goto <bb 5>; [100.00%]

  <bb 9> [local count: 1073279389]:
  _32 = it_30 - SR.352_183;
  _33 = _32 /[ex] 4;
  D.357125 ={v} {CLOBBER};
  D.311383 ={v} {CLOBBER};
  return _33;

which makes this loop about 10x slower on my computer.

GCC 8 also generates lots of code setting up the function that GCC 7 manages to
eliminate.


This regression was introduced by r255510:

  2017-12-08  Martin Jambor  <mjam...@suse.cz>

        PR tree-optimization/83141
        * tree-sra.c (contains_vce_or_bfcref_p): Move up in the file, also
        test for MEM_REFs implicitely changing types with padding.  Remove
        inline keyword.
        (build_accesses_from_assign): Added contains_vce_or_bfcref_p checks.


To reproduce the problem, compile the attached file as

  g++ -O2 -S ranges.ii

and notice the difference in the generated code.

Reply via email to