> Currently we mess up here in two places. One is pattern recognition
> which computes a mask-precision for a bool reduction PHI that's
> inconsistent with that of the latch definition. This is solved by
> iterating the mask-precision computation. The second is that the
> reduction epilogue generation and the code querying support for it
> isn't ready for mask inputs. This should be fixed as well, mainly
> by doing all the epilogue processing on a data type again, for now
> at least. We probably want reduc_mask_{and,ior,xor}_scal optabs
> so we can go the direct IFN path on masks if the target supports
> that.
Why not reuse the regular reduc_{and,...}_scal optabs allowing mask modes
similar to vec_extract?
I guess the scalar mode might be controversial then but we already have that
when extracting from masks where RVV needs to support both, QImode and BImode.
(Something I've been wanting to fix for a while...)
> [2/3] adds these optabs and [3/3] is a way to use them but with no
> actual target implementation (I stubbed a x86 one for one case to
> get the code miscompiled^Wexercised).
>
> I wonder if there's any feedback on the epilogue handling, esp. how
> SVE or RVV would be able to handle vector mask reduction to a
> scalar bool in the epilogue. Esp. whether you think there is already
> sufficient functionality that I just didn't see.
We'd need to emulate them as we, generally, only have unary/binary operations
on masks. What we do have is popcount on a mask with a scalar destination, so
"mask_popcount" = "popcount + extract_first".
This itself is of course not exposed as optab yet. Likely wouldn't be
generally useful as it includes an extract?
Anyway, the reductions could look like:
reduc_xor (mask) = ("mask_popcount" (mask, other_mask, len, ...) & 1)
reduc_ior (mask) = ("mask_popcount" (...) != 0)
reduc_and (mask) = ("mask_popcount" == len)
if I didn't mess up the bit-foo. The comparisons we'd perform in the
scalar domain.