Hi Juzhe,

thanks looks pretty comprehensive already.

> +(define_expand "vec_perm<mode>"
> +  [(match_operand:V 0 "register_operand")
> +   (match_operand:V 1 "register_operand")
> +   (match_operand:V 2 "register_operand")
> +   (match_operand:<VINDEX> 3 "vector_perm_operand")]
> +  "TARGET_VECTOR && GET_MODE_NUNITS (<MODE>mode).is_constant ()"
> +  {
> +    riscv_vector::expand_vec_perm (operands[0], operands[1],
> +                                operands[2], operands[3]);
> +    DONE;
> +  }
> +)

IMHO this would be the perfect use of expand_vec_perm (operands)
instead of the individual arguments (also the way aarch64 does it)
but your call in the end.

> +/* Return true if VEC is a constant in which every element is in the range
> +   [MINVAL, MAXVAL].  The elements do not need to have the same value.  */
> +
> +static bool
> +const_vec_all_in_range_p (rtx vec, HOST_WIDE_INT minval, HOST_WIDE_INT 
> maxval)
> +{
> +  if (!CONST_VECTOR_P (vec)
> +      || GET_MODE_CLASS (GET_MODE (vec)) != MODE_VECTOR_INT)
> +    return false;
> +
> +  int nunits;
> +  if (!CONST_VECTOR_STEPPED_P (vec))
> +    nunits = const_vector_encoded_nelts (vec);
> +  else if (!CONST_VECTOR_NUNITS (vec).is_constant (&nunits))
> +    return false;
> +
> +  for (int i = 0; i < nunits; i++)
> +    {
> +      rtx vec_elem = CONST_VECTOR_ELT (vec, i);
> +      if (!CONST_INT_P (vec_elem)
> +       || !IN_RANGE (INTVAL (vec_elem), minval, maxval))
> +     return false;
> +    }
> +  return true;
> +}
> +
> +/* Return a const_int vector of VAL.  */
> +
> +static rtx
> +gen_const_vector_dup (machine_mode mode, HOST_WIDE_INT val)
> +{
> +  rtx c = gen_int_mode (val, GET_MODE_INNER (mode));
> +  return gen_const_vec_duplicate (mode, c);
> +}
> +

Both functions are taken from aarch64.  I would suggest adding
comments in that respect in case somebody wants to unify that in the
middle-end at some point (and wonders if they really do the same thing).

> +/* This function emits VLMAX vrgather instruction. Emit vrgather.vx/vi when 
> sel
> +   is a const duplicate vector. Otherwise, emit vrgather.vv.  */
> +static void
> +emit_vlmax_gather_insn (rtx target, rtx op, rtx sel)
> +{
> +  rtx elt;
> +  insn_code icode;
> +  machine_mode data_mode = GET_MODE (target);
> +  if (const_vec_duplicate_p (sel, &elt))
> +    {
> +      icode = code_for_pred_gather_scalar (data_mode);
> +      sel = elt;
> +    }
> +  else
> +    icode = code_for_pred_gather (data_mode);
> +  rtx ops[] = {target, op, sel};
> +  emit_vlmax_insn (icode, RVV_BINOP, ops);
> +}
> +
> +static void
> +emit_vlmax_masked_gather_mu_insn (rtx target, rtx op, rtx sel, rtx mask)
> +{
> +  rtx elt;
> +  insn_code icode;
> +  machine_mode data_mode = GET_MODE (target);
> +  if (const_vec_duplicate_p (sel, &elt))
> +    {
> +      icode = code_for_pred_gather_scalar (data_mode);
> +      sel = elt;
> +    }
> +  else
> +    icode = code_for_pred_gather (data_mode);
> +  rtx ops[] = {target, mask, target, op, sel};
> +  emit_vlmax_masked_mu_insn (icode, RVV_BINOP_MU, ops);
> +}
> +
> +/* Implement vec_perm<mode>.  */
> +
> +void
> +expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel)
> +{
> +  machine_mode data_mode = GET_MODE (target);
> +  machine_mode sel_mode = GET_MODE (sel);
> +  /* Enforced by the pattern condition.  */

This is supposed to refer to the const-ness of sel_mode?

> +  /* Note: vec_perm indices are supposed to wrap when they go beyond the
> +     size of the two value vectors, i.e. the upper bits of the indices
> +     are effectively ignored.  RVV vrgather instead produces 0 for any
> +     out-of-range indices, so we need to modulo all the vec_perm indices
> +     to ensure they are all in range.  */
> +  int nunits = GET_MODE_NUNITS (sel_mode).to_constant ();

Ah, saw it in the aarch64 version, it makes more sense when arranged the same
way they did it so I'd suggest using their comment/code order.  Where is the
modulo actually happening, though?

> +  /* Check if the two values vectors are the same.  */
> +  if (rtx_equal_p (op0, op1) || const_vec_duplicate_p (sel))
> +    {
> +      rtx max_sel = gen_const_vector_dup (sel_mode, nunits - 1);
> +      rtx sel_mod = expand_simple_binop (sel_mode, AND, sel, max_sel, NULL, 
> 0,
> +                                      OPTAB_DIRECT);

Ah, here.  But why not also further down?  I mean it gets clearer when
reading the whole function a second time but a bit of a strategy comment
upfront would help clear up misunderstandings.

Just thinking out loudly as I read it:

> +  rtx max_sel = gen_const_vector_dup (sel_mode, nunits);> +
> +  /* Step 1: generate a mask that should select second vector.  */
> +  expand_vec_cmp (mask, GEU, sel, max_sel);

So we select everything >= nunits into the mask.

> +  /* Step2: gather a intermediate result for index < nunits,
> +         we don't need to care about the result of the element
> +         whose index >= nunits.  */
> +  emit_vlmax_gather_insn (target, op0, sel);

Then gather every op0 indexed by sel into target.  Values indexed
>= nunits are zeroed by vgather.

> +
> +  /* Step3: generate a new selector for second vector.  */
> +  rtx tmp = gen_reg_rtx (sel_mode);
> +  rtx ops[] = {tmp, sel, max_sel};
> +  emit_vlmax_insn (code_for_pred (MINUS, sel_mode), RVV_BINOP, ops);

Now we shift the range from (nunits, max_of_mode] to
[0, max_of_mode - nunits].

> +  /* Step4: merge the result of index >= nunits. */
> +  emit_vlmax_masked_gather_mu_insn (target, op1, tmp, mask);
> +}

And now we gather those into the previously masked-out elements
of target.  Where do we actually ensure that indices >= 2 * nunits
are used modulo and not zeroed?  If it's somewhere I didn't spot
it and it certainly warrants a comment.

Are we actually testing the wraparound/modulo indices?  I didn't
immediately see it but might have overlooked something.  If so, please
add a comment in the test itself what we expect to happen or scan
for an and or so.  Is builtin_shufflevector undefined for values >=
2 * nunits?  That would explain it partially.

Regards
 Robin

Reply via email to