[Bug target/111449] memcmp (p,q,16) == 0 can be optimized better on ppc64 with vector comparison instructions

cvs-commit at gcc dot gnu.org via Gcc-bugs Fri, 17 Nov 2023 01:20:58 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111449


--- Comment #3 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by HaoChen Gui <guih...@gcc.gnu.org>:

https://gcc.gnu.org/g:cd295a80c91040fd4d826528c8e8e07fe909ae62

commit r14-5548-gcd295a80c91040fd4d826528c8e8e07fe909ae62
Author: Haochen Gui <guih...@gcc.gnu.org>
Date:   Fri Nov 17 17:12:32 2023 +0800

    rs6000: Enable vector mode for by pieces equality compare

    This patch adds a new expand pattern - cbranchv16qi4 to enable vector
    mode by pieces equality compare on rs6000.  The macro MOVE_MAX_PIECES
    (COMPARE_MAX_PIECES) is set to 16 bytes when EFFICIENT_UNALIGNED_VSX
    is enabled, otherwise keeps unchanged.  The macro STORE_MAX_PIECES is
    set to the same value as MOVE_MAX_PIECES by default, so now it's
    explicitly defined and keeps unchanged.

    gcc/
            PR target/111449
            * config/rs6000/altivec.md (cbranchv16qi4): New expand pattern.
            * config/rs6000/rs6000.cc (rs6000_generate_compare): Generate
            insn sequence for V16QImode equality compare.
            * config/rs6000/rs6000.h (MOVE_MAX_PIECES): Define.
            (STORE_MAX_PIECES): Define.

    gcc/testsuite/
            PR target/111449
            * gcc.target/powerpc/pr111449-1.c: New.
            * gcc.dg/tree-ssa/sra-17.c: Add additional options for 32-bit
powerpc.
            * gcc.dg/tree-ssa/sra-18.c: Likewise.

--- Comment #4 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by HaoChen Gui <guih...@gcc.gnu.org>:

https://gcc.gnu.org/g:10615c8a10d6b61e813254924d76be728dbd4688

commit r14-5549-g10615c8a10d6b61e813254924d76be728dbd4688
Author: Haochen Gui <guih...@gcc.gnu.org>
Date:   Fri Nov 17 17:17:59 2023 +0800

    rs6000: Fix regression cases caused 16-byte by pieces move

    The previous patch enables 16-byte by pieces move. Originally 16-byte
    move is implemented via pattern.  expand_block_move does an optimization
    on P8 LE to leverage V2DI reversed load/store for memory to memory move.
    Now 16-byte move is implemented via by pieces move and finally split to
    two DI load/store.  This patch creates an insn_and_split pattern to
    retake the optimization.

    gcc/
            PR target/111449
            * config/rs6000/vsx.md (*vsx_le_mem_to_mem_mov_ti): New.

    gcc/testsuite/
            PR target/111449
            * gcc.target/powerpc/pr111449-2.c: New.

[Bug target/111449] memcmp (p,q,16) == 0 can be optimized better on ppc64 with vector comparison instructions

Reply via email to