https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94343

--- Comment #15 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to jbeulich from comment #11)
> (In reply to Jakub Jelinek from comment #7)
> > Though, there are other issues.  There is only vpternlog{d,q}, so for
> > V*[QH]Imode we shouldn't pretend we have masking support.
> 
> Why would this be? The element mode doesn't matter at all for bitwise
> operations. Just like there's no VPANDB / VPANDW, but VPANDD/VPANDQ are
> quite fine to use on vectors of QI or HI. Afaict the existence of VPAND{D,Q}
> in AVX512 (as opposed to {,V}PAND in MMX/SSE2/AVX) is merely an oddity
> resulting from EVEX.W handling (besides of course the element width's effect
> on embedded broadcasting).

For masked instructions, the element mode is significant, it determines which
bits of the mask register apply to which bits in the destination register.
So, if the masked variant is ever matched (e.g. by combine), then it will
expect to do something different from what the insn will actually do.
(define_insn ("one_cmplv64qi2_mask")
     [
        (set (match_operand:V64QI 0 ("register_operand") ("=v"))
            (vec_merge:V64QI (xor:V64QI (match_operand:V64QI 1
("nonimmediate_operand") ("vm"))
                    (match_operand:V64QI 2 ("vector_all_ones_operand") ("BC")))
                (match_operand:V64QI 3 ("nonimm_or_0_operand") ("0C"))
                (match_operand:DI 4 ("register_operand") ("Yk"))))
    ] ("(TARGET_AVX512F) && ((TARGET_AVX512F) && (TARGET_AVX512BW))")
("vpternlogd\t{$0x55, %1, %0, %0%{%4%}%N3|%0%{%4%}%N3, %0, %1, 0x55}")
     [
        (set_attr ("type") ("sselog"))
        (set_attr ("prefix") ("evex"))
        (set_attr ("mode") ("XI"))
        (set_attr ("mask") ("no"))
    ])
The above says in RTL that for say the last operand of 0x5555555555555555ULL,
first 8 bits in the vector will be the result of the ternlog operation, next 8
bits will be cleared or unmodified (depending on operand 3), etc.
The instruction used for that will do something different, will ignore the
upper 48 bits of the mask register and the low 32 bits of the destination will
be the result of the ternlog operation, next 32 bits will be cleared or
unmodified, etc.

Reply via email to