On Tue, Nov 20, 2018 at 8:59 AM Eric Botcazou <ebotca...@adacore.com> wrote: > > > The blockage was introduced as a fix for PR14381 [1] in r79265 [2]. > > Later, the blockage was moved after return label as a fix for PR25176 > > [3] in r107871 [4]. > > > > After that, r122626 [5] moves the blockage after the label for the > > naked return from the function. Relevant posts from gcc-patches@ ML > > are at [6], [7]. However, in the posts, there are no concrete > > examples, how scheduler moves instructions from different BB around > > blockage insn, the posts just show that there is a jump around > > blockage when __builtin_return is used. I was under impression that > > scheduler is unable to move instructions over BB boundaries. > > The scheduler works on extended basic blocks. The [7] post gives a rather > convincing explanation and there is a C++ testcase under PR rtl-opt/14381. > > > A mystery is the tree-ssa merge [8] that copies back the hunk, moved > > in r122626 [5] to its original position. From this revision onwards, > > we emit two blockages. > > It's the dataflow merge, not the tree-ssa merge. The additional blockage > might be needed for DF. > > Given that the current PR is totally artificial, I think that we need to be > quite conservative and only do something on mainline. And even there I'd be > rather conservative and remove the kludge only for targets that emit unwind > information in the epilogue (among which there is x86 I presume).
Hm, I think I'll rather go with somehow target-dependent patch: --cut here-- diff --git a/gcc/mode-switching.c b/gcc/mode-switching.c index 370a49e90a9c..de75efe2b6c9 100644 --- a/gcc/mode-switching.c +++ b/gcc/mode-switching.c @@ -252,7 +252,21 @@ create_pre_exit (int n_entities, int *entity_map, const int *num_modes) if (EDGE_COUNT (EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) == 1 && NONJUMP_INSN_P ((last_insn = BB_END (src_bb))) && GET_CODE (PATTERN (last_insn)) == USE - && GET_CODE ((ret_reg = XEXP (PATTERN (last_insn), 0))) == REG) + && GET_CODE ((ret_reg = XEXP (PATTERN (last_insn), 0))) == REG + + /* x86 targets use mode-switching infrastructure to + conditionally insert vzeroupper instruction at the exit + from the function and there is no need to switch the + mode before the return value copy. The vzeroupper insertion + pass runs after reload, so use !reload_completed as a stand-in + for x86 to skip the search for return value copy insn. + + N.b.: the code below assumes that return copy insn + immediately precedes its corresponding use insn. This + assumption does not hold after reload, since sched1 pass + can reschedule return copy insn away from its + corresponding use insn. */ + && !reload_completed) { int ret_start = REGNO (ret_reg); int nregs = REG_NREGS (ret_reg); --cut here-- WDYT? Uros.