https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95488
Bug ID: 95488 Summary: Suboptimal multiplication codegen for v16qi Product: gcc Version: 11.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: crazylht at gmail dot com Target Milestone: --- Target: x86_64-*-* i?86-*-* cat test.c --- typedef unsigned char v16qi __attribute__ ((vector_size (16))); v16qi foo (v16qi a, v16qi b) { return a*b; } --- gcc -O2 -march=skylake-avx512 --- foo(unsigned char __vector(16), unsigned char __vector(16)): vpunpcklbw xmm3, xmm0, xmm0 vpunpcklbw xmm2, xmm1, xmm1 vpunpckhbw xmm0, xmm0, xmm0 vpunpckhbw xmm1, xmm1, xmm1 vpmullw xmm2, xmm2, xmm3 vpmullw xmm1, xmm1, xmm0 vmovdqa xmm3, XMMWORD PTR .LC0[rip] vpand xmm0, xmm3, xmm2 vpand xmm3, xmm3, xmm1 vpackuswb xmm0, xmm0, xmm3 ret .LC0: .value 255 .value 255 .value 255 .value 255 .value 255 .value 255 .value 255 .value 255 --- icc generate --- foo(unsigned char __vector(16), unsigned char __vector(16)): vpmovzxbw ymm2, xmm0 #5.15 vpmovzxbw ymm3, xmm1 #5.15 vpmullw ymm4, ymm2, ymm3 #5.15 vpmovwb xmm0, ymm4 #5.15 vzeroupper #5.15 ret #5.15 --- we can do better in ix86_expand_vecop_qihi, problem is how can i get sign info for an rtx operand.