On 11/20/18 3:24 AM, Uros Bizjak wrote:
> On Tue, Nov 20, 2018 at 8:59 AM Eric Botcazou <ebotca...@adacore.com> wrote:
>>
>>> The blockage was introduced as a fix for PR14381 [1] in r79265 [2].
>>> Later, the blockage was moved after return label as a fix for PR25176
>>> [3] in r107871 [4].
>>>
>>> After that, r122626 [5] moves the blockage after the label for the
>>> naked return from the function. Relevant posts from gcc-patches@ ML
>>> are at [6], [7]. However, in the posts, there are no concrete
>>> examples, how scheduler moves instructions from different BB around
>>> blockage insn, the posts just show that there is a jump around
>>> blockage when __builtin_return is used. I was under impression that
>>> scheduler is unable to move instructions over BB boundaries.
>>
>> The scheduler works on extended basic blocks.  The [7] post gives a rather
>> convincing explanation and there is a C++ testcase under PR rtl-opt/14381.
>>
>>> A mystery is the tree-ssa merge [8] that copies back the hunk, moved
>>> in r122626 [5] to its original position. From this revision onwards,
>>> we emit two blockages.
>>
>> It's the dataflow merge, not the tree-ssa merge.  The additional blockage
>> might be needed for DF.
>>
>> Given that the current PR is totally artificial, I think that we need to be
>> quite conservative and only do something on mainline.  And even there I'd be
>> rather conservative and remove the kludge only for targets that emit unwind
>> information in the epilogue (among which there is x86 I presume).
> 
> Hm, I think I'll rather go with somehow target-dependent patch:
> 
> --cut here--
> diff --git a/gcc/mode-switching.c b/gcc/mode-switching.c
> index 370a49e90a9c..de75efe2b6c9 100644
> --- a/gcc/mode-switching.c
> +++ b/gcc/mode-switching.c
> @@ -252,7 +252,21 @@ create_pre_exit (int n_entities, int *entity_map,
> const int *num_modes)
>         if (EDGE_COUNT (EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) == 1
>             && NONJUMP_INSN_P ((last_insn = BB_END (src_bb)))
>             && GET_CODE (PATTERN (last_insn)) == USE
> -           && GET_CODE ((ret_reg = XEXP (PATTERN (last_insn), 0))) == REG)
> +           && GET_CODE ((ret_reg = XEXP (PATTERN (last_insn), 0))) == REG
> +
> +           /* x86 targets use mode-switching infrastructure to
> +              conditionally insert vzeroupper instruction at the exit
> +              from the function and there is no need to switch the
> +              mode before the return value copy.  The vzeroupper insertion
> +              pass runs after reload, so use !reload_completed as a stand-in
> +              for x86 to skip the search for return value copy insn.
Note that the GCN target may well end up needing a late mode switching
pass -- it's got kindof an inverse problem to solve -- where to place
initializations of the exec register which is needed when we want to do
scalar ops in a simd unit.

I thought the SH used mode switching as well.  BUt I can't recall if it
was run before register allocation & reload.

jeff


Reply via email to