On Mon, Nov 5, 2012 at 7:05 PM, Eric Botcazou <ebotca...@adacore.com> wrote: >> This sequence breaks assumption in mode-switching.c, that final >> function return register load operates in MODE_EXIT mode and triggere >> following code: >> >> for (j = n_entities - 1; j >= 0; j--) >> { >> int e = entity_map[j]; >> int mode = MODE_NEEDED (e, return_copy); >> >> if (mode != num_modes[e] && mode != MODE_EXIT (e)) >> break; >> } >> >> As discussed above, modes of loads, generated from __builtin_apply >> have no connection to function return mode. mode-switching.c does >> detect __builtin_apply situation and raises maybe_builtin_apply flag, >> but doesn't use it to short-circuit wrong check. In proposed patch, we >> detect this situation and raise force_late_switch in the same way, as >> SH4 does for its "late" fpscr emission. > > If I understand correctly, we need to insert the vzeroupper because the > function returns double in SSE registers but we generate an OImode load > instead of a DFmode load because of the __builtin_return. So we're in the > forced_late_switch case but we fail to recognize the tweaked return value load > since the number of registers doesn't match. > > If so, I'd rather add another special case, like for the SH4, instead of a > generic bypass for maybe_builtin_apply, something along the lines of: > > /* For the x86 with AVX, we might be using a larger load for a value > returned in SSE registers and we want to put the final mode switch > after this return value copy. */ > if (copy_start == ret_start > && nregs == hard_regno_nregs[ret_start][GET_MODE (ret_reg)] > && copy_num >= nregs > && OBJECT_P (SET_SRC (return_copy_pat))) > forced_late_switch = 1;
Yes, this approach also works. I assume it is OK to commit attached patch? 2012-11-05 Eric Botcazou <ebotca...@adacore.com> Uros Bizjak <ubiz...@gmail.com> * mode-switching.c (create_pre_exit): Force late switch for __builtin_return case, when value, returned in SSE register, was loaded using OImode load. Tested on x86_64-pc-linux-gnu, also with to-be-committed avx-vzeroupper-27.c Thanks, Uros.
Index: mode-switching.c =================================================================== --- mode-switching.c (revision 193174) +++ mode-switching.c (working copy) @@ -1,6 +1,6 @@ /* CPU mode switching Copyright (C) 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2007, 2008, - 2009, 2010 Free Software Foundation, Inc. + 2009, 2010, 2012 Free Software Foundation, Inc. This file is part of GCC. @@ -342,6 +342,17 @@ create_pre_exit (int n_entities, int *entity_map, } if (j >= 0) { + /* For the x86 with AVX, we might be using a larger + load for a value returned in SSE registers and we + want to put the final mode switch after this + return value copy. */ + if (copy_start == ret_start + && nregs + == hard_regno_nregs[ret_start][GET_MODE (ret_reg)] + && copy_num >= nregs + && OBJECT_P (SET_SRC (return_copy_pat))) + forced_late_switch = 1; + /* For the SH4, floating point loads depend on fpscr, thus we might need to put the final mode switch after the return value copy. That is still OK,