On Fri, Feb 28, 2020 at 6:15 PM H.J. Lu <[email protected]> wrote:
>
> On Fri, Feb 28, 2020 at 4:16 PM Jeff Law <[email protected]> wrote:
> >
> > On Thu, 2020-02-27 at 06:50 -0800, H.J. Lu wrote:
> > >
> > > How about this? If it looks OK, I will post the whole patch set.
> > It's better. I'm guessing the two cases that were previously handled with
> > vextract/vbroadcast aren't supposed to happen? They're caught here IIUC:
> >
> > > + /* NB: To move xmm16-xmm31/ymm16-ymm31 registers without AVX512VL,
> > > + we can only use zmm register move without memory operand. */
> > > + if (evex_reg_p
> > > + && !TARGET_AVX512VL
> > > + && GET_MODE_SIZE (mode) < 64)
> > > + {
> > > + if (memory_operand (operands[0], mode)
> > > + || memory_operand (operands[1], mode))
> > > + gcc_unreachable ();
> > >
> >
> > If they truly can't happen, that's fine. My worry is I don't see changes to
> > the operand predicates or constraints which would avoid this case. Is it
> > prevented by the mode iterator on the operands? Again, just want to make
> > sure
> > I understand why the vextract/vbroadcast stuff isn't in the new code.
>
> There are no GCC testcases to show that they are actually ever used. That is
> why I removed them and added gcc_unreachable ().
This is covered by the testcases I added:
[hjl@gnu-cfl-2 gcc]$ cat /tmp/x.c
#include <x86intrin.h>
extern __m128 d;
void
foo1 (__m128 x)
{
register __m128 xmm16 __asm ("xmm16") = x;
asm volatile ("" : "+v" (xmm16));
d = xmm16;
}
[hjl@gnu-cfl-2 gcc]$ gcc -O2 -march=skylake-avx512 /tmp/x.c -S
[hjl@gnu-cfl-2 gcc]$ gcc -O2 -march=skylake-avx512 -mno-avx512vl /tmp/x.c -S
/tmp/x.c: In function ‘foo1’:
/tmp/x.c:8:19: error: register specified for ‘xmm16’ isn’t suitable
for data type
8 | register __m128 xmm16 __asm ("xmm16") = x;
| ^~~~~
[hjl@gnu-cfl-2 gcc]$
GCC doesn't allow xmm16-xmm31/ymm16-ymm31 without AVX512VL since
ix86_hard_regno_mode_ok has
/* AVX512VL allows sse regs16+ for 128/256 bit modes. */
if (TARGET_AVX512VL
&& (mode == OImode
|| mode == TImode
|| VALID_AVX256_REG_MODE (mode)
|| VALID_AVX512VL_128_REG_MODE (mode)))
return true;
/* xmm16-xmm31 are only available for AVX-512. */
if (EXT_REX_SSE_REGNO_P (regno))
return false;
The vextract/vbroadcast stuff is dead code.
> > I'm doing a little assuming that the <ssescalarsize> bits in the old code
> > are
> > mapped correctly to the 32/64 suffixes on the opcodes in the new version.
> >
> > I'm also assuming that mapping of "size" in the argument to ix86_get_ssemov
> > to
> > the operand modifiers g, t, and x are right. I'm guessing the operand
> > modifiers weren't needed in the original because we had the actual operand
> > and
> > could look at it to get the right modifier. In the evex, but not avx512vl
> > case
> > those are forced to a g modifier which seems to match the original.
> >
> > Are we going to need further refinements to
> > ix86_output_ssemov/ix86_get_ssemov?
> > If so, then I'd suggest the next patch be those patterns which don't require
> > further refinements to ix86_output_ssemov.
>
> 4 patches don't require changes in ix86_output_ssemov/ix86_get_ssemov:
>
> https://gitlab.com/x86-gcc/gcc/-/commit/426f2464abb80b97b8533f9efa15bbe72e6aa888
> https://gitlab.com/x86-gcc/gcc/-/commit/ec5b40d77f7a4424935275f1a7ccedbce83b6f54
> https://gitlab.com/x86-gcc/gcc/-/commit/92fdd98234984f86b66fb5403dd828661cd7999f
> https://gitlab.com/x86-gcc/gcc/-/commit/f8fa5e571caf6740b36d042d631b4ace11683cd7
>
> I can combine them into a single patch.
>
> Other 5 patches contain a small change to ix86_output_ssemov:
>
> https://gitlab.com/x86-gcc/gcc/-/commit/b1746392e1d350d689a80fb71b2c72f909c20f30
> https://gitlab.com/x86-gcc/gcc/-/commit/14c3cbdbdcc36fa1edea4572b89a039726a4e2bc
> https://gitlab.com/x86-gcc/gcc/-/commit/69c8c928b26242116cc261a9d2f6b1265218f1d3
> https://gitlab.com/x86-gcc/gcc/-/commit/04335f582f0b281d5f357185d154087997fd7cfd
> https://gitlab.com/x86-gcc/gcc/-/commit/64f6a5d6d3405331d9c02aaae0faccf449d6647a
>
> Should I made the change and submit them for review?
I am preparing the new patch set.
> > If no further refinements to ix86_output_ssemov/ix86_get_ssemov are
> > required,
> > then I think you can just send the rest of the pattern changes in a single
> > unit.
> >
> > jeff
> >
--
H.J.