On Tue, Nov 20, 2018 at 8:59 AM Eric Botcazou <ebotca...@adacore.com> wrote:
>
> > The blockage was introduced as a fix for PR14381 [1] in r79265 [2].
> > Later, the blockage was moved after return label as a fix for PR25176
> > [3] in r107871 [4].
> >
> > After that, r122626 [5] moves the blockage after the label for the
> > naked return from the function. Relevant posts from gcc-patches@ ML
> > are at [6], [7]. However, in the posts, there are no concrete
> > examples, how scheduler moves instructions from different BB around
> > blockage insn, the posts just show that there is a jump around
> > blockage when __builtin_return is used. I was under impression that
> > scheduler is unable to move instructions over BB boundaries.
>
> The scheduler works on extended basic blocks.  The [7] post gives a rather
> convincing explanation and there is a C++ testcase under PR rtl-opt/14381.
>
> > A mystery is the tree-ssa merge [8] that copies back the hunk, moved
> > in r122626 [5] to its original position. From this revision onwards,
> > we emit two blockages.
>
> It's the dataflow merge, not the tree-ssa merge.  The additional blockage
> might be needed for DF.
>
> Given that the current PR is totally artificial, I think that we need to be
> quite conservative and only do something on mainline.  And even there I'd be
> rather conservative and remove the kludge only for targets that emit unwind
> information in the epilogue (among which there is x86 I presume).

Hm, I think I'll rather go with somehow target-dependent patch:

--cut here--
diff --git a/gcc/mode-switching.c b/gcc/mode-switching.c
index 370a49e90a9c..de75efe2b6c9 100644
--- a/gcc/mode-switching.c
+++ b/gcc/mode-switching.c
@@ -252,7 +252,21 @@ create_pre_exit (int n_entities, int *entity_map,
const int *num_modes)
        if (EDGE_COUNT (EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) == 1
            && NONJUMP_INSN_P ((last_insn = BB_END (src_bb)))
            && GET_CODE (PATTERN (last_insn)) == USE
-           && GET_CODE ((ret_reg = XEXP (PATTERN (last_insn), 0))) == REG)
+           && GET_CODE ((ret_reg = XEXP (PATTERN (last_insn), 0))) == REG
+
+           /* x86 targets use mode-switching infrastructure to
+              conditionally insert vzeroupper instruction at the exit
+              from the function and there is no need to switch the
+              mode before the return value copy.  The vzeroupper insertion
+              pass runs after reload, so use !reload_completed as a stand-in
+              for x86 to skip the search for return value copy insn.
+
+              N.b.: the code below assumes that return copy insn
+              immediately precedes its corresponding use insn.  This
+              assumption does not hold after reload, since sched1 pass
+              can reschedule return copy insn away from its
+              corresponding use insn.  */
+           && !reload_completed)
          {
            int ret_start = REGNO (ret_reg);
            int nregs = REG_NREGS (ret_reg);
--cut here--

WDYT?

Uros.

Reply via email to