AVX vector intrinsics

crazylht at gmail dot com via Gcc-bugs Tue, 20 Oct 2020 01:00:10 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97366


--- Comment #6 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Alexander Monakov from comment #5)
> afaict LRA is just following IRA decisions, and IRA allocates that pseudo to
> memory due to costs.
> 
> Not sure where strange cost is coming from, but it depends on x86 tuning
> options: with -mtune=skylake we get the expected code, with -mtune=haswell
> we get 128-bit vectors right and extra load for 256-bit, with -mtune=generic
> both cases have extra loads.

in 
----
  /* If this insn loads a parameter from its stack slot, then it
     represents a savings, rather than a cost, if the parameter is
     stored in memory.  Record this fact.

     Similarly if we're loading other constants from memory (constant
     pool, TOC references, small data areas, etc) and this is the only
     assignment to the destination pseudo.

     Don't do this if SET_SRC (set) isn't a general operand, if it is
     a memory requiring special instructions to load it, decreasing
     mem_cost might result in it being loaded using the specialized
     instruction into a register, then stored into stack and loaded
     again from the stack.  See PR52208.

     Don't do this if SET_SRC (set) has side effect.  See PR56124.  */
  if (set != 0 && REG_P (SET_DEST (set)) && MEM_P (SET_SRC (set))
      && (note = find_reg_note (insn, REG_EQUIV, NULL_RTX)) != NULL_RTX
      && ((MEM_P (XEXP (note, 0))
           && !side_effects_p (SET_SRC (set)))
          || (CONSTANT_P (XEXP (note, 0))
              && targetm.legitimate_constant_p (GET_MODE (SET_DEST (set)),
                                                XEXP (note, 0))
              && REG_N_SETS (REGNO (SET_DEST (set))) == 1))
      && general_operand (SET_SRC (set), GET_MODE (SET_SRC (set)))
      /* LRA does not use equiv with a symbol for PIC code.  */
      && (! ira_use_lra_p || ! pic_offset_table_rtx
          || ! contains_symbol_ref_p (XEXP (note, 0))))
    {
      enum reg_class cl = GENERAL_REGS;
      rtx reg = SET_DEST (set);
      int num = COST_INDEX (REGNO (reg));

      COSTS (costs, num)->mem_cost
        -= ira_memory_move_cost[GET_MODE (reg)][cl][1] * frequency;
      record_address_regs (GET_MODE (SET_SRC (set)),
                           MEM_ADDR_SPACE (SET_SRC (set)),
                           XEXP (SET_SRC (set), 0), 0, MEM, SCRATCH,
                           frequency * 2);
      counted_mem = true;
    }
---

for 

(insn 9 8 11 3 (set (reg:V2DI 88 [ _16 ])
        (mem:V2DI (plus:DI (reg/v/f:DI 91 [ input ])
                (reg:DI 89 [ ivtmp.11 ])) [0 MEM[(const __m128i *
{ref-all})input_7(D) + ivtmp.11_40 * 1]+0 S16 A128]))
"/export/users2/liuhongt/tools-build/build_gcc11_master_debug/gcc/include/emmintrin.h":697:10
1405 {movv2di_internal}

mem_cost for r88 would minus ira_memory_move_cost[V2DImode][GENERAL_REGS][1],
and got -11808 as an initial value, but for reality it should minus
ira_memory_move_cost[V2DImode][SSE_REGS][1], then have -5905 as an initial
value. It seems it adds too much preference to memory here.

Then in the later record_operand_costs, when ira found r88 would also be used
in shift and ior instruction, the mem_cost for r88 increases, but still smaller 
than costs of SSE_REGS because we add too much preference to memory in the
upper. Finally, ira would choose memory for r88 because it has lowest cost and
it's suboptimal.

a10(r88,l1) costs: SSE_FIRST_REG:0,0 NO_REX_SSE_REGS:0,0 SSE_REGS:0,0
MEM:-984,-984

[Bug target/97366] [8/9/10/11 Regression] Redundant load with SSE/AVX vector intrinsics

Reply via email to