Richard Biener <[email protected]> writes:
> On Wed, 18 Feb 2026, Richard Biener wrote:
>
>> On Wed, 18 Feb 2026, Richard Sandiford wrote:
>> 
>> > "Robin Dapp" <[email protected]> writes:
>> > >> I'd think a specific flag that indicates "never elide" for the
>> > >> vec_predicate
>> > >> could help for that case.  I just wonder how to properly annotate the 
>> > >> predicate.  I'd be hesitant to add even more arguments/operand,
>> > >> even 5 as I
>> > >> proposed is not currently supported out of the box.  Maybe an otherwise 
>> > >> unused RTL flag can be repurposed?  That might be too implicit, though?
>> > >
>> > > Hmm, on the other hand, even for riscv we don't want the predicate to be 
>> > > elided.  What I'd like is provably dead and side-effect free instructions
>> > > (zero length, mask all false) to be deleted.  Our actual vector
>> > > insns are
>> > > all described with a predicate and won't match without it.
>> > >
>> > > What we do is have the autovec expanders use unpredicated
>> > > patterns.  At the
>> > > first split they are adorned with our unspec predicate.  But that
>> > > mechanism
>> > > should go away and I'd much rather start out with predicates
>> > > during expansion,
>> > > at least for the optimizations that combine et al. can handle themselves.
>> > >
>> > > So my suggestion would instead be to set up the generic code in a
>> > > way that
>> > > "vec_predicate" would never be elided.
>> > 
>> > Yeah.  Or at least, not as part of "normal" simpification.  Perhaps we
>> > could still have a pass that tries to elimiate predication, if that turns
>> > out to be useful.
>> > 
>> > But I can't think of any cases off-hand where nested RTL operations are
>> > interpreted contextually.  And IMO that's a good thing that we should
>> > try to preserve.  In principle, given:
>> > 
>> >   (if_then_else (...A...) (...B...) (...C...))
>> > 
>> > it ought to be possible to preevaluate A, B or C in a register without
>> > changing semantics.  It's true that the target might not provide patterns
>> > to set A, B or C directly to a register.  But semantic correctness
>> > shouldn't rely on a failure to match.
>> > 
>> > Same idea for vec_merge.
>> > 
>> > So it's beginning to feel to me like vec_predicate should be the
>> > top-level operation, with the vectors as operands, rather than be
>> > something that is nested in another expression.  That means that
>> > we won't be able to reuse if_then_else code.  But it's sounding
>> > increasingly like that code wouldn't Just Work anyway.
>> > 
>> > I also wonder whether, rather than:
>> > 
>> >   (vec_predicate .... (plus A B))
>> > 
>> > we should instead encode "plus", A and B as direct operands:
>> > 
>> >   (vec_predicate plus [A B] ...)
>> > 
>> > The code could be stored in the "u2" field of the rtx.  And putting
>> > the [A B] first would fit better with existing assumptions, since IIRC
>> > XVEC (x, n) for n > 0 doesn't occur outside of build-time generators.
>> > 
>> > This again avoids contextual interpretation.  A predicated plus is not
>> > equivalent to taking an existing unpredicated plus and predicating it,
>> > and vice versa.
>> 
>> So besides if_then_else there's also cond_exec.  The question is
>> whether we'd want to have
>> 
>>  (set (reg:..) (vec_predicate plus [A B]..))
>> 
>> or
>> 
>>  (cond_exec (vec_predicate ...) (set (...) (plus ...))
>> 
>> now, we'd have to introduce "else value" and also have it
>> do per vector lane enablement.  But maybe the nested (set ...)
>> is too awkward to deal with?
>
> In particular I wonder how we want to handle predicated stores?

Yeah, was just going to reply to your earlier message saying
something similar.  (cond_exec ...) might be a stretch for something
like a predicated addition, since on SVE the set is unconditional:
it's always a full register write regardless of the predicate.
Having the predication on the rhs seems more accurate there.

But we lack a good way of representing predicated stores.  Currently
SVE uses a read-modify-write of memory, but of course that isn't
accurate, since rmw would fault on unmapped addresses.  A per-lane
cond_exec set could be good for that.

The same problem occurs for predicated loads.  (vec_predicate mem ...)
would in principle work there, but hiding a mem would be a big change,
and would raise the question of where the MEM_ATTRs would go.  Urgh...

Thanks,
Richard

Reply via email to