On Mon, 2023-12-18 at 18:45 +0100, Jakub Jelinek wrote:
> On Tue, Dec 19, 2023 at 12:48:46AM +0800, Xi Ruoyao wrote:
> > > > gcc/ChangeLog:
> > > > 
> > > >         PR middle-end/113033
> > > >         * expmed.cc (expand_shift_1): When expanding rotate shift, call
> > > >         negate_rtx instead of simplify_gen_unary (NEG, ...).
> > 
> > > The key difference being that using negate_rtx will go through the
> > > expander which knows how to synthesize negation whereas 
> > > simplify_gen_unary will just generate a (neg ...) and assume it matches 
> > > something in the backend, right?
> > 
> > For PR113033 the key difference (to me) is negate_rtx emits an insn to
> > set a new pseudo reg to -x.  So the result will be
> > 
> > (set (reg:SI 81) (neg:SI (reg:SI 80)))
> > 
> > then
> > 
> > (and (reg:SI 81) (const_int 31))
> > 
> > instead of a consolidated
> > 
> > (and:SI (neg:SI (reg:SI IN)) (const_int 63))
> > 
> > AFAIK no backends have an instruction doing "negate an operand then and
> > bitwisely".
> 
> Can you explain why it doesn't work as is though?
> I mean, expand_shift_1 with that (and (neg (reg ...)) (const_int ...))
> should try to legitimize the operand (e.g. in maybe_legitimize_operand
> -> force_operand and force_operand should be able to deal with that,
> AND is binary op, so it recurses on the 2 operands and NEG is UNARY_P,
> so the recursion should deal with that if it is not general_operand.

It happens with vector left rotate:

V test (V a, int x)
{
  int _1; 
  V _4; 

  <bb 2> [local count: 1073741824]:
  _1 = x_2(D) & 31; 
  _4 = a_3(D) r<< _1; 
  return _4; 

}

Here V is in V4SImode.  With other_amount = (and (neg (reg 85))
(const_int 31)), we end up calling

expand_shift_1 (
  code = RSHIFT_EXPR,
  mode = V4SImode,
  shifted = (reg:V4SI 82),
  amount = (and:SI (neg:SI (reg:SI 85)) (const_int 31)),
  target = (reg:V4SI 84),
  unsignedp = true,
  may_fail = false)

It then calls

expand_binop (
  mode = V4SImode,
  lshr_optab,
  op0 = (reg:V4SI 82),
  op1 = (and:SI (neg:SI (reg:SI 85)) (const_int 31)),
  target = (reg:V4SI 84),
  unsignedp=1,
  methods=OPTAB_DIRECT)

In expand_binop:

rtx vop1 = expand_vector_broadcast (mode, op1);

LoongArch backend don't have vec_duplicate (well, broadcasting is
implemented as a special case of vec_init and maybe this is not so
good...), so finally we get:

  vec = rtvec_alloc (n); 
  for (int i = 0; i < n; ++i) 
    RTVEC_ELT (vec, i) = op;
  rtx ret = gen_reg_rtx (vmode);
  emit_insn (GEN_FCN (icode) (ret, gen_rtx_PARALLEL (vmode, vec)));

here "op" is (and:SI (neg:SI (reg:SI 85)) (const_int 31)), thus it
evaded expansion :(.

-- 
Xi Ruoyao <xry...@xry111.site>
School of Aerospace Science and Technology, Xidian University

Reply via email to