https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79390
--- Comment #18 from rguenther at suse dot de <rguenther at suse dot de> --- On Fri, 7 Apr 2017, jakub at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79390 > > --- Comment #16 from Jakub Jelinek <jakub at gcc dot gnu.org> --- > Has somebody the benchmark around to retry with current trunk, with > -f{,no-}split-paths and compare that to some older trunk and gcc6? On a broadwell machine I get (-O3 -march=native) gcc6: 5507.42 Mflops gcc7: 4787.26 Mflops gcc7: 5435.08 Mflops [-fno-split-paths] so the RTL if-conversion works now unless inhibited by path splitting. What path splitting does is mostly undone by loop disambiguation which re-creates the merger so path splitting just made the loop multiple exit (without simplifying the duplicated exit condition). So we can add more heuristics to tame down loop splitting, for example never duplicating a joiner that has an exit. Or adding to the quite stupid if-cvt mitigation code (missing the minmax case). Or add even more outs to the threading opportunity detection code... We currently find that t_175 = PHI <t_184(6), ab_177(7)> in the merger exposes a threading opportunity because it has one arg that is unchanged over the latch (t_184 over 6->8) and it has a use in the threading destination (in the controlling condition even). This all just exposes that path splitting is not well integrated into what it tries to expose (threading). IMHO it should have been part of backwards/forward threading. But that ship has sailed (Jeff approved it). I've tried to fixup after the MIA authors. But well. I can fixup by removing the pass again. Or adding more oddball heuristics. This case which seems important for x86_64 is for (i=j+1; i<M; i++) { double ab = fabs(A[i][j]); if ( ab > t) { jp = i; t = ab; } } so reducing MAX plus remembering the index of the maximum value. We're not phiopt-ing that to MAX because it might not be profitable (the condition has to remain). So path splitting could be profitable on some archs. IFF we wouldn't re-create that shared latch right afterwards anyway (and forget to propagate single-arg PHIs resulting from the BB duplication).