[Bug tree-optimization/88440] size optimization of memcpy-like code

rguenth at gcc dot gnu.org Wed, 22 May 2019 04:49:40 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88440


--- Comment #21 from Richard Biener <rguenth at gcc dot gnu.org> ---
Ick.

static inline void
check_pseudos_live_through_calls (int regno,
                                  HARD_REG_SET last_call_used_reg_set,
                                  rtx_insn *call_insn)
{
...
  for (hr = 0; HARD_REGISTER_NUM_P (hr); hr++)
    if (targetm.hard_regno_call_part_clobbered (call_insn, hr,
                                                PSEUDO_REGNO_MODE (regno)))
      add_to_hard_reg_set (&lra_reg_info[regno].conflict_hard_regs,
                           PSEUDO_REGNO_MODE (regno), hr);

this loop is repeatedly computing an implicit hard-reg set for
which hard-regs are partly clobbered by the call for the _same_
actual instruction since check_pseudos_live_through_calls is called
via

      /* Mark each defined value as live.  We need to do this for
         unused values because they still conflict with quantities
         that are live at the time of the definition.  */
      for (reg = curr_id->regs; reg != NULL; reg = reg->next)
        {
          if (reg->type != OP_IN)
            {
              update_pseudo_point (reg->regno, curr_point, USE_POINT);
              mark_regno_live (reg->regno, reg->biggest_mode);
              check_pseudos_live_through_calls (reg->regno,
                                                last_call_used_reg_set,
                                                call_insn);
...
        }

and

              EXECUTE_IF_SET_IN_SPARSESET (pseudos_live, j)
                {
                  IOR_HARD_REG_SET (lra_reg_info[j].actual_call_used_reg_set,
                                    this_call_used_reg_set);

                  if (flush)
                    check_pseudos_live_through_calls (j,
                                                      last_call_used_reg_set,
                                                      last_call_insn);
                }

and

      /* Mark each used value as live.  */
      for (reg = curr_id->regs; reg != NULL; reg = reg->next)
        if (reg->type != OP_OUT)
          {
            if (reg->type == OP_IN)
              update_pseudo_point (reg->regno, curr_point, USE_POINT);
            mark_regno_live (reg->regno, reg->biggest_mode);
            check_pseudos_live_through_calls (reg->regno,
                                              last_call_used_reg_set,
                                              call_insn);
          }

and

  EXECUTE_IF_SET_IN_BITMAP (df_get_live_in (bb), FIRST_PSEUDO_REGISTER, j, bi)
    {
      if (sparseset_cardinality (pseudos_live_through_calls) == 0)
        break;
      if (sparseset_bit_p (pseudos_live_through_calls, j))
        check_pseudos_live_through_calls (j, last_call_used_reg_set,
call_insn);
    }

the pseudos mode may change but I guess usually it doesn't.  I also wonder
why the target hook doesn't return a hard-reg-set ...

That said, the above code doesn't scale well with functions with a lot of
calls at least, also the passed call_insn isn't the current insn and
might even be NULL.  All but aarch64 do not even look at the actual instruction
(even more an argument for re-designing the hook with it's use in mind).

I guess an artificial testcase with a lot of calls and a lot of live
pseudos (even single-BB) should show this issue easily.

Samples: 579  of event 'cycles:ppp', Event count (approx.): 257134187434191     
Overhead  Command  Shared Object     Symbol                                     
  22.26%  f951     f951              [.] process_bb_lives
  15.06%  f951     f951              [.] ix86_hard_regno_call_part_clobbered
   8.55%  f951     f951              [.] concat
   6.88%  f951     f951              [.] find_base_term
   3.60%  f951     f951              [.] get_ref_base_and_extent
   3.27%  f951     f951              [.] find_base_term
   2.95%  f951     f951              [.] make_hard_regno_dead

[Bug tree-optimization/88440] size optimization of memcpy-like code

Reply via email to