Hello! The assert in create_pre_exit at mode-switching.c expects return copy pair with nothing in between. However, the compiler starts mode switching pass with the following sequence:
(insn 19 18 16 2 (set (reg:V2SF 21 xmm0) (mem/c:V2SF (plus:DI (reg/f:DI 7 sp) (const_int -72 [0xffffffffffffffb8])) [0 S8 A64])) "pr88070.c":8 1157 {*movv2sf_internal} (nil)) (insn 16 19 20 2 (set (reg:V2SF 0 ax [orig:91 <retval> ] [91]) (reg:V2SF 0 ax [89])) "pr88070.c":8 1157 {*movv2sf_internal} (nil)) (insn 20 16 21 2 (unspec_volatile [ (const_int 0 [0]) ] UNSPECV_BLOCKAGE) "pr88070.c":8 710 {blockage} (nil)) (insn 21 20 23 2 (use (reg:V2SF 21 xmm0)) "pr88070.c":8 -1 (nil)) Please note how (insn 16) interferes with (insn 19)-(insn 21) return copy pair. The culprit for this is the blockage instruction (insn 20), which causes sched1 pass (pre reload scheduler) to skip marking (insn 19) as unmovable instruction (as a dependent insn on return use insn), so the scheduler is free to schedule (insn 16) between return copy pair (insn 19)-(insn 21). The extra instruction is generated as a kludge in expand_function_end to prevent instructions that may trap to be scheduled into function epilogue. However, the same blockage is generated under exactly the same conditions earlier in the expand_function_end. The first blockage should prevent unwanted scheduling into the epilogue, so there is actually no need for the second one. Attached patch removes the kludge. BTW: The extra blockage would crash compilation for all mode-switching targets, also in the pre-reload mode switching; the vzeroupper post-reload insertion just trips x86 target on a generic problem in the middle-end. 2018-11-19 Uros Bizjak <ubiz...@gmail.com> PR middle-end/88070 * function.c (expand_function_end): Remove kludge that generates second blockage insn. testsuite/ChangeLog: 2018-11-19 Uros Bizjak <ubiz...@gmail.com> PR middle-end/88070 * gcc.target/i386/pr88070.c: New test. Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32} for all default languages, obj-c++ and go. OK for mainline and release branches? Uros.
Index: function.c =================================================================== --- function.c (revision 266278) +++ function.c (working copy) @@ -5447,13 +5447,6 @@ expand_function_end (void) if (naked_return_label) emit_label (naked_return_label); - /* @@@ This is a kludge. We want to ensure that instructions that - may trap are not moved into the epilogue by scheduling, because - we don't always emit unwind information for the epilogue. */ - if (cfun->can_throw_non_call_exceptions - && targetm_common.except_unwind_info (&global_options) != UI_SJLJ) - emit_insn (gen_blockage ()); - /* If stack protection is enabled for this function, check the guard. */ if (crtl->stack_protect_guard && targetm.stack_protect_runtime_enabled_p ()) stack_protect_epilogue (); Index: testsuite/gcc.target/i386/pr88070.c =================================================================== --- testsuite/gcc.target/i386/pr88070.c (nonexistent) +++ testsuite/gcc.target/i386/pr88070.c (working copy) @@ -0,0 +1,12 @@ +/* PR target/88070 */ +/* { dg-do compile } */ +/* { dg-options "-O -fexpensive-optimizations -fnon-call-exceptions -fschedule-insns -fno-dce -fno-dse -mavx" } */ + +typedef float vfloat2 __attribute__ ((__vector_size__ (2 * sizeof (float)))); + +vfloat2 +test1float2 (float c) +{ + vfloat2 v = { c, c }; + return v; +}