https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122656
--- Comment #4 from Jeffrey A. Law <law at gcc dot gnu.org> ---
So further thoughts.
The intrinsic in question has this signature:
vint32mf2x2_t __riscv_vlseg2e32ff_v_i32mf2x2(const int32_t *rs1, size_t
*new_vl,
size_t vl);
And it's supposed to correspond to this insn:
(define_insn "@pred_fault_load<mode>"
[(set (match_operand:VT 0 "register_operand" "=vr, vr, vd")
(if_then_else:VT
(unspec:<VM>
[(match_operand:<VM> 1 "vector_mask_operand" "vmWc1, Wc1, vm")
(match_operand 4 "vector_length_operand" " rvl, rvl, rvl")
(match_operand 5 "const_int_operand" " i, i, i")
(match_operand 6 "const_int_operand" " i, i, i")
(match_operand 7 "const_int_operand" " i, i, i")
(reg:SI VL_REGNUM)
(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
(unspec:VT
[(match_operand 3 "pmode_reg_or_0_operand" " rJ, rJ, rJ")
(mem:BLK (scratch))] UNSPEC_VLEFF)
(match_operand:VT 2 "vector_merge_operand" " 0, vu,
vu")))
(set (reg:SI VL_REGNUM)
(unspec:SI
[(if_then_else:VT
(unspec:<VM>
[(match_dup 1) (match_dup 4) (match_dup 5)
(match_dup 6) (match_dup 7)
(reg:SI VL_REGNUM)
(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
(unspec:VT
[(match_dup 3) (mem:BLK (scratch))] UNSPEC_VLEFF)
(match_dup 2))] UNSPEC_MODIFY_VL))]
We emit the initial code for the intrinsic via use_exact_insn.
m_ops[0] will be the output operand, so OK.
m_ops[1] will be the mask operand, so OK.
m_ops[2] will be the merge operand, so OK.
m_ops[3] will be "a", so OK (input memory operand)
m_ops[4] will be "c", the new VL (doesn't seem right)
m_ops[5] will be "b", which also seems wrong (should be in m_ops[4])
m_ops[6] will be the tail policy (should be in m_ops[5])
m_ops[7] will be the mask policy (should be in m_ops[6])
m_ops[8] is the ??? We've run off the end of the array anyway
m_ops[9] is wrongly added as we've already run off the array
I don't see how the insn is supposed to handle the output vl and as a result it
appears entries are off by one after that operand is inserted.
What doesn't make sense to me is the normal FoF load has the same structure and
doesn't blow up. It has a different blob of expansion code, but it looks like
the differences are basically around whether or not the input memory is passed
as an address vs an actual MEM.
I can make space for the new VL object, but I worry that's going to cause
problems. Consider the fold() method for the builtins. Consider all the blobs
that tell us the index if various objects of interest like tail/mask policy,
etc.
I need help :-) I don't see how the segmented FoF is materially different in
its behavior from the standard FoF case. Of course, maybe the standard FoF
case isn't working either.