https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85762
Bug ID: 85762 Summary: [8/9 Regression] range-v3 abstraction overhead not optimized away Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- Created attachment 44124 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44124&action=edit preprocessed source code for run_range() GCC 8 is less aggressive than earlier versions when eliminating abstraction overhead in the range-v3 library, which can be seen with the function #include <vector> #include <range/v3/all.hpp> long run_range(std::vector<int> const &lengths, long to_find) { auto const found_index = ranges::distance(lengths | ranges::view::transform(ranges::convert_to<long>{}) | ranges::view::partial_sum() | ranges::view::take_while([=](auto const i) { return !(to_find < i); })); return found_index; } GCC 7 compiled the loop to <bb 5> [10.87%]: # it$_M_current_41 = PHI <_6(4), _27(8)> # it$16_26 = PHI <it$16_24(4), _31(8)> _53 = to_find_2(D) < it$16_26; <bb 6> [100.00%]: # it$_M_current_23 = PHI <it$_M_current_41(5), _27(7)> _20 = _7 == it$_M_current_23; _5 = _20 | _53; if (_5 != 0) goto <bb 9>; [7.36%] else goto <bb 7>; [92.64%] <bb 7> [92.60%]: _27 = it$_M_current_23 + 4; if (_7 != _27) goto <bb 8>; [3.75%] else goto <bb 6>; [96.25%] <bb 8> [3.47%]: _29 = MEM[(const int &)it$_M_current_23 + 4]; _30 = (long int) _29; _31 = it$16_26 + _30; goto <bb 5>; [100.00%] <bb 9> [7.36%]: _33 = (long int) it$_M_current_23; _34 = (long int) _6; _35 = _33 - _34; _36 = _35 /[ex] 4; return _36; while the loop compiled by GCC 8 updates some structures in each iteration <bb 5> [local count: 1478210893]: # it_47 = PHI <SR.352_183(4), _64(8)> # it$16$sum__115 = PHI <SR.353_184(4), _67(8)> _42 = to_find_2(D) < it$16$sum__115; <bb 6> [local count: 1651554780]: # it_30 = PHI <it_47(5), _64(7)> _46 = it_30 == SR.355_137; _40 = _42 | _46; if (_40 != 0) goto <bb 9>; [65.00%] else goto <bb 7>; [35.00%] <bb 7> [local count: 577812955]: SR.80_62 = MEM[(const struct __normal_iterator &)SR.354_185 + 24]; MEM[(struct adaptor_cursor *)&pos] = SR.80_62; MEM[(struct box *)&D.417725].value = pos; SR.396_209 = MEM[(struct adaptor_cursor *)&D.417725]; _64 = it_30 + 4; if (_64 != SR.396_209) goto <bb 8>; [70.00%] else goto <bb 6>; [30.00%] <bb 8> [local count: 404469068]: _65 = MEM[(const int &)it_30 + 4]; _66 = (long int) _65; _67 = _66 + it$16$sum__115; goto <bb 5>; [100.00%] <bb 9> [local count: 1073279389]: _32 = it_30 - SR.352_183; _33 = _32 /[ex] 4; D.357125 ={v} {CLOBBER}; D.311383 ={v} {CLOBBER}; return _33; which makes this loop about 10x slower on my computer. GCC 8 also generates lots of code setting up the function that GCC 7 manages to eliminate. This regression was introduced by r255510: 2017-12-08 Martin Jambor <mjam...@suse.cz> PR tree-optimization/83141 * tree-sra.c (contains_vce_or_bfcref_p): Move up in the file, also test for MEM_REFs implicitely changing types with padding. Remove inline keyword. (build_accesses_from_assign): Added contains_vce_or_bfcref_p checks. To reproduce the problem, compile the attached file as g++ -O2 -S ranges.ii and notice the difference in the generated code.