>>> On 01.12.17 at 06:45, <kirill.yuk...@gmail.com> wrote: > On 29 Nov 08:59, Jan Beulich wrote: >> in an unrelated context I've stumbled across a change of yours >> from Aug 2014 (revision 213847) where you "extend" the ways >> of loading zeros into registers. I don't understand why this was >> done, and the patch submission mail also doesn't give any reason. >> My point is that simple VEX-encoded vxorps/vxorpd/vpxor with >> 128-bit register operands ought to be sufficient to zero any width >> registers, due to the zeroing of the high parts the instructions do. >> Hence by using EVEX encoded insns it looks like all you do is grow >> the instruction length by one or two bytes (besides making the >> source somewhat more complicated to follow). At the very least >> the shorter variants should be used for -Os imo. > As far as I can recall, this was done since we cannot load zeroes > into upper 16 MM registers, which are available in EVEX exclusively.
Ah, I did overlook this aspect indeed. I still think the smaller VEX encoding should then be used for the low 16 registers. Furthermore this typedef double __attribute__((vector_size(16))) v2df_t; typedef double __attribute__((vector_size(32))) v4df_t; void test1(void) { register v2df_t x asm("xmm31") = {}; asm volatile("" :: "v" (x)); } void test2(void) { register v4df_t x asm("ymm31") = {}; asm volatile("" :: "v" (x)); } translates to "vxorpd %xmm31, %xmm31, %xmm31" for both functions with -mavx512vl, yet afaict the instructions would #UD without AVX-512DQ, which suggests to me that the original intention wasn't fully met. Jan