https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95488

            Bug ID: 95488
           Summary: Suboptimal multiplication codegen for v16qi
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: crazylht at gmail dot com
  Target Milestone: ---
            Target: x86_64-*-* i?86-*-*

cat test.c

---
typedef unsigned char v16qi __attribute__ ((vector_size (16)));
v16qi
foo (v16qi a, v16qi b)
{
    return  a*b;
}
---

gcc -O2 -march=skylake-avx512

---
foo(unsigned char __vector(16), unsigned char __vector(16)):
        vpunpcklbw      xmm3, xmm0, xmm0
        vpunpcklbw      xmm2, xmm1, xmm1
        vpunpckhbw      xmm0, xmm0, xmm0
        vpunpckhbw      xmm1, xmm1, xmm1
        vpmullw xmm2, xmm2, xmm3
        vpmullw xmm1, xmm1, xmm0
        vmovdqa xmm3, XMMWORD PTR .LC0[rip]
        vpand   xmm0, xmm3, xmm2
        vpand   xmm3, xmm3, xmm1
        vpackuswb       xmm0, xmm0, xmm3
        ret
.LC0:
        .value  255
        .value  255
        .value  255
        .value  255
        .value  255
        .value  255
        .value  255
        .value  255
---

icc generate
---
foo(unsigned char __vector(16), unsigned char __vector(16)):
        vpmovzxbw ymm2, xmm0                                    #5.15
        vpmovzxbw ymm3, xmm1                                    #5.15
        vpmullw   ymm4, ymm2, ymm3                              #5.15
        vpmovwb   xmm0, ymm4                                    #5.15
        vzeroupper                                              #5.15
        ret                                                     #5.15
---

we can do better in ix86_expand_vecop_qihi, problem is how can i get sign info
for an rtx operand.

Reply via email to