https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114098

--- Comment #3 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by H.J. Lu <h...@gcc.gnu.org>:

https://gcc.gnu.org/g:4972f97a265c574d51e20373ddefd66576051e5c

commit r14-9171-g4972f97a265c574d51e20373ddefd66576051e5c
Author: H.J. Lu <hjl.to...@gmail.com>
Date:   Sun Feb 25 10:21:04 2024 -0800

    x86: Properly implement AMX-TILE load/store intrinsics

    ldtilecfg and sttilecfg take a 512-byte memory block.  With
    _tile_loadconfig implemented as

    extern __inline void
    __attribute__((__gnu_inline__, __always_inline__, __artificial__))
    _tile_loadconfig (const void *__config)
    {
      __asm__ volatile ("ldtilecfg\t%X0" :: "m" (*((const void **)__config)));
    }

    GCC sees:

    (parallel [
      (asm_operands/v ("ldtilecfg   %X0") ("") 0
       [(mem/f/c:DI (plus:DI (reg/f:DI 77 virtual-stack-vars)
                             (const_int -64 [0xffffffffffffffc0])) [1
MEM[(const void * *)&tile_data]+0 S8 A128])]
       [(asm_input:DI ("m"))]
       (clobber (reg:CC 17 flags))])

    and the memory operand size is 1 byte.  As the result, the rest of 511
    bytes is ignored by GCC.  Implement ldtilecfg and sttilecfg intrinsics
    with a pointer to XImode to honor the 512-byte memory block.

    gcc/ChangeLog:

            PR target/114098
            * config/i386/amxtileintrin.h (_tile_loadconfig): Use
            __builtin_ia32_ldtilecfg.
            (_tile_storeconfig): Use __builtin_ia32_sttilecfg.
            * config/i386/i386-builtin.def (BDESC): Add
            __builtin_ia32_ldtilecfg and __builtin_ia32_sttilecfg.
            * config/i386/i386-expand.cc (ix86_expand_builtin): Handle
            IX86_BUILTIN_LDTILECFG and IX86_BUILTIN_STTILECFG.
            * config/i386/i386.md (ldtilecfg): New pattern.
            (sttilecfg): Likewise.

    gcc/testsuite/ChangeLog:

            PR target/114098
            * gcc.target/i386/amxtile-4.c: New test.

Reply via email to