https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114801

avieira at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |avieira at gcc dot gnu.org

--- Comment #17 from avieira at gcc dot gnu.org ---
Before anything, it might be worth to redefine the testcase to something where
the predicate would have an effect in the result, for instance:

#include <arm_mve.h>
uint32x4_t test_9() {
  return vdupq_m_n_u32(vdupq_n_u32(0xffffffff), 0, 0xcccc);
}

Next, it might be worth pointing out that the ISA does specify what happens
when a predicate mask does not have all bits set for a specific element.
Basically, the predicate mask operates on a per byte basis. Hence 16-bits in
the mask, controlling all 16-bytes in a vector register.

So for the above, the expected output would be {0xFFFF0000, 0xFFFF0000,
0xFFFF0000, 0xFFFF0000}.

Having said that I can see how you'd interpret the ACLE specs as defining such
a mask to be 'UB', but I believe the intent was to make clear that all bits
needed to be set if you wanted to true-predicate the full {32,16}-bit element.
This is the most common use, I can't imagine many users will be manipulating
the mask in such ways.

clang seems to follow this behavior generating an assembly sequence that leads
to the expected output, though they use vpsel probably due to some
canonicalization. And I'd prefer to be consistent with clang here.

Reply via email to