On Tue, Sep 3, 2019 at 1:33 PM Richard Biener
<richard.guent...@gmail.com> wrote:

> > > Note:
> > > Removing limit of cost would introduce lots of regressions in SPEC2017 as 
> > > follow
> > > --------------------------------
> > > 531.deepsjeng_r  -7.18%
> > > 548.exchange_r  -6.70%
> > > 557.xz_r -6.74%
> > > 508.namd_r -2.81%
> > > 527.cam4_r -6.48%
> > > 544.nab_r -3.99%
> > >
> > > Tested on skylake server.
> > > -------------------------------------
> > > How about  changing cost from 2 to 8 until we figure out a better number.
> >
> > Certainly works for me.  Note the STV code uses the "other" sse_to_integer
> > number and the numbers in question here are those for the RA.  There's
> > a multitude of values used in the tables here, including some a lot larger.
> > So the overall bumping to 8 certainly was the wrong thing to do and instead
> > individual numbers should have been adjusted (didn't look at the history
> > of that bumping).
>
> For reference:
>
> r125951 | uros | 2007-06-22 19:51:06 +0200 (Fri, 22 Jun 2007) | 6 lines
>
>     PR target/32413
>     * config/i386/i386.c (ix86_register_move_cost): Rise the cost of
>     moves between MMX/SSE registers to at least 8 units to prevent
>     ICE caused by non-tieable SI/HI/QImodes in SSE registers.
>
> should probably have been "twice the cost of X" or something like that
> instead where X be some reg-reg move cost.

Thanks for the reference. It looks that the patch fixes the issue in
the wrong place, this should be solved in
inline_secondary_memory_needed:

      /* Between SSE and general, we have moves no larger than word size.  */
      if (!(INTEGER_CLASS_P (class1) || INTEGER_CLASS_P (class2))
           || GET_MODE_SIZE (mode) < GET_MODE_SIZE (SImode)
           || GET_MODE_SIZE (mode) > UNITS_PER_WORD)
        return true;

as an alternative to implement QI and HImode moves as a SImode move
between SSE and int<->SSE registers. We have
ix86_secondary_memory_needed_mode that extends QI and HImode secondary
memory to SImode, so this should solve PR32413.

Other than that, what to do with the bizzare property of direct moves
that benchmark far worse than indirect moves? I was expecting that
keeping the cost of direct inter-regset moves just a bit below the
cost of int<->mem<->xmm, but (much ?) higher than itra-regset moves
would prevent unwanted wandering of values between register sets,
while still generating the direct move when needed. While this almost
fixes the runtime regression, it is not clear to me from Hongtao Liu's
message if  Richard's 2019-08-27 fixes the remaining regression or
not). Liu, can you please clarify?

> >  For example Pentium4 has quite high bases for move
> > costs, like xmm <-> xmm move costing 12 and SSE->integer costing 20
> > while the opposite 12.
> >
> > So yes, we want to revert the patch by applying its effect to the
> > individual cost tables so we can revisit this for the still interesting
> > micro-architectures.

Uros.

Reply via email to