Re: loading of zeros into {x,y,z}mm registers

Jakub Jelinek Fri, 01 Dec 2017 04:19:28 -0800

On Fri, Dec 01, 2017 at 05:08:40AM -0700, Jan Beulich wrote:
> >>> On 01.12.17 at 06:45, <kirill.yuk...@gmail.com> wrote:
> > On 29 Nov 08:59, Jan Beulich wrote:
> >> in an unrelated context I've stumbled across a change of yours
> >> from Aug 2014 (revision 213847) where you "extend" the ways
> >> of loading zeros into registers. I don't understand why this was
> >> done, and the patch submission mail also doesn't give any reason.
> >> My point is that simple VEX-encoded vxorps/vxorpd/vpxor with
> >> 128-bit register operands ought to be sufficient to zero any width
> >> registers, due to the zeroing of the high parts the instructions do.
> >> Hence by using EVEX encoded insns it looks like all you do is grow
> >> the instruction length by one or two bytes (besides making the
> >> source somewhat more complicated to follow). At the very least
> >> the shorter variants should be used for -Os imo.
> > As far as I can recall, this was done since we cannot load zeroes
> > into upper 16 MM registers, which are available in EVEX exclusively.
> 
> Ah, I did overlook this aspect indeed. I still think the smaller VEX
> encoding should then be used for the low 16 registers.


If you have a testcase where that is not the case, please file a PR with it.

> Furthermore this
> 
> typedef double __attribute__((vector_size(16))) v2df_t;
> typedef double __attribute__((vector_size(32))) v4df_t;
> 
> void test1(void) {
>       register v2df_t x asm("xmm31") = {};
>       asm volatile("" :: "v" (x));
> }
> 
> void test2(void) {
>       register v4df_t x asm("ymm31") = {};
>       asm volatile("" :: "v" (x));
> }
> 
> translates to "vxorpd %xmm31, %xmm31, %xmm31" for both
> functions with -mavx512vl, yet afaict the instructions would #UD
> without AVX-512DQ, which suggests to me that the original
> intention wasn't fully met.

This indeed is a bug, please file a PR; we should IMHO just use
vpxorq instead in that case, which is just AVX512VL and doesn't need
DQ.  Of course if DQ is available, we should use vxorpd.
Working on a fix.

        Jakub

Re: loading of zeros into {x,y,z}mm registers

Reply via email to