On Thu, Dec 30, 2021 at 3:45 PM Uros Bizjak <ubiz...@gmail.com> wrote:
>
> This patch adds basic V2QImode infrastructure and V2QImode arithmetic
> operations (plus, minus and neg).  The patched compiler can emit SSE
> vectorized QImode operations (e.g. PADDB) with partial QImode vector,
> and also synthesized double HI/LO QImode operations with integer registers.
>
> The testcase:
>
> typedef char __v2qi __attribute__ ((__vector_size__ (2)));
> __v2qi plus  (__v2qi a, __v2qi b) { return a + b; };
>
> compiles with -O2 to:
>
>         movl    %edi, %edx
>         movl    %esi, %eax
>         addb    %sil, %dl
>         addb    %ah, %dh
>         movl    %edx, %eax
>         ret
>
> which is much better than what the unpatched compiler produces:
>
>         movl    %edi, %eax
>         movl    %esi, %edx
>         xorl    %ecx, %ecx
>         movb    %dil, %cl
>         movsbl  %dh, %edx
>         movsbl  %ah, %eax
>         addl    %edx, %eax
>         addb    %sil, %cl
>         movb    %al, %ch
>         movl    %ecx, %eax
>         ret
>
> The V2QImode vectorization does not require vector registers, so it can
> be enabled by default also for 32-bit targets without SSE.
>
> The patch also enables vectorized V2QImode sign/zero extends.
>
> The reason for RFC are several warning failures in
> Wstringop-overflow-*.[Cc] as a result of an unwanted vectorization. I
> tried to sprinkle vect_slp_v2qi_store_align xfails around, but
> unfortunately without success, since I have no idea about the details
> of these tests.
>
> I didn't want to introduce testsuite FAILs, so help with these failing
> tests is greatly appreciated.

This is now fixed in a separate patch.

> Anyway, the above example shows the potential of V2QImode
> vectorization. There are additional similar optimizations possible
> (e.g. shifts with GPRs) in addition to SSE instructions on partial
> V2QI vectors.
>
> Patch is bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
>
> 2021-12-30  Uroš Bizjak  <ubiz...@gmail.com>
>
> gcc/ChangeLog:
>
>     PR target/103861
>     * config/i386/i386.h (VALID_SSE2_REG_MOODE): Add V2QImode.
>     (VALID_INT_MODE_P): Ditto.
>     * config/i386/i386.c (ix86_secondary_reload): Handle
>     V2QImode reloads from SSE register to memory.
>     (vector_mode_supported_p): Always return true for V2QImode.
>     * config/i386/i386.md (*subqi_ext<mode>_2): New insn pattern.
>     (*negqi_ext<mode>_2): Ditto.
>     * config/i386/mmx.md (movv2qi): New expander.
>     (movmisalignv2qi): Ditto.
>     (*movv2qi_internal): New insn pattern.
>     (*pushv2qi2): Ditto.
>     (negv2qi2 and splitters): Ditto.
>     (<plusminus:insn>v2qi3 and splitters): Ditto.
>
> gcc/testsuite/ChangeLog:
>
>     PR target/103861
>     * gcc.dg/store_merging_18.c (dg-options): Add -fno-tree-vectorize.
>     * gcc.dg/store_merging_29.c (dg-options): Ditto.
>     * gcc.target/i386/pr103861.c: New test.
>     * gcc.target/i386/pr92658-avx512vl.c (dg-final):
>     Remove vpmovqb scan-assembler xfail.
>     * gcc.target/i386/pr92658-sse4.c (dg-final):
>     Remove pmovzxbq scan-assembler xfail.
>     * gcc.target/i386/pr92658-sse4-2.c (dg-final):
>     Remove pmovsxbq scan-assembler xfail.
>     * gcc.target/i386/warn-vect-op-2.c (dg-warning): Adjust warnings.

Now pushed to master.

Uros.

Reply via email to