https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125880

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Alexander Monakov from comment #3)
> Ah, no, on Intel only vpbroadcast{d,q} are entirely handled by the load
> ports, vpbroadcast{b,w} still have a port 5 uop in addition to a load uop.
> 
> But on AMD all four broadcast variants appear to be equally cheap.

With AVX512 it might be also possible to use vpbroadcast{b,w} with a
{z} writemask to get select lanes zeroed.  Of course this requires mask
register setup which will likely make the pxor + pinsr combination
faster (the pxor zeroing idiom is usually resolved at rename stage).

Reply via email to