On 11/20/18 3:24 AM, Uros Bizjak wrote: > On Tue, Nov 20, 2018 at 8:59 AM Eric Botcazou <ebotca...@adacore.com> wrote: >> >>> The blockage was introduced as a fix for PR14381 [1] in r79265 [2]. >>> Later, the blockage was moved after return label as a fix for PR25176 >>> [3] in r107871 [4]. >>> >>> After that, r122626 [5] moves the blockage after the label for the >>> naked return from the function. Relevant posts from gcc-patches@ ML >>> are at [6], [7]. However, in the posts, there are no concrete >>> examples, how scheduler moves instructions from different BB around >>> blockage insn, the posts just show that there is a jump around >>> blockage when __builtin_return is used. I was under impression that >>> scheduler is unable to move instructions over BB boundaries. >> >> The scheduler works on extended basic blocks. The [7] post gives a rather >> convincing explanation and there is a C++ testcase under PR rtl-opt/14381. >> >>> A mystery is the tree-ssa merge [8] that copies back the hunk, moved >>> in r122626 [5] to its original position. From this revision onwards, >>> we emit two blockages. >> >> It's the dataflow merge, not the tree-ssa merge. The additional blockage >> might be needed for DF. >> >> Given that the current PR is totally artificial, I think that we need to be >> quite conservative and only do something on mainline. And even there I'd be >> rather conservative and remove the kludge only for targets that emit unwind >> information in the epilogue (among which there is x86 I presume). > > Hm, I think I'll rather go with somehow target-dependent patch: > > --cut here-- > diff --git a/gcc/mode-switching.c b/gcc/mode-switching.c > index 370a49e90a9c..de75efe2b6c9 100644 > --- a/gcc/mode-switching.c > +++ b/gcc/mode-switching.c > @@ -252,7 +252,21 @@ create_pre_exit (int n_entities, int *entity_map, > const int *num_modes) > if (EDGE_COUNT (EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) == 1 > && NONJUMP_INSN_P ((last_insn = BB_END (src_bb))) > && GET_CODE (PATTERN (last_insn)) == USE > - && GET_CODE ((ret_reg = XEXP (PATTERN (last_insn), 0))) == REG) > + && GET_CODE ((ret_reg = XEXP (PATTERN (last_insn), 0))) == REG > + > + /* x86 targets use mode-switching infrastructure to > + conditionally insert vzeroupper instruction at the exit > + from the function and there is no need to switch the > + mode before the return value copy. The vzeroupper insertion > + pass runs after reload, so use !reload_completed as a stand-in > + for x86 to skip the search for return value copy insn. Note that the GCN target may well end up needing a late mode switching pass -- it's got kindof an inverse problem to solve -- where to place initializations of the exec register which is needed when we want to do scalar ops in a simd unit.
I thought the SH used mode switching as well. BUt I can't recall if it was run before register allocation & reload. jeff