http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52908

--- Comment #2 from Uros Bizjak <ubizjak at gmail dot com> 2012-04-09 11:48:05 
UTC ---
Created attachment 27117
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27117
Proposed patch

There are indeed two problems with XOP patterns:

a) duplication of *sse4_1_mulv2siv2di3 pattern
b) wrong order of operands in all (!!!) XOP patterns. XOP patterns consider
element 0 as MSB.

Attached patch solves this by simply removing fake xop_mulv2div2di3_{low,high}
patterns, expanding to (fixed) xop_pmacsdq{h,l} patterns directly. There is
simply no need to use vpmacsdql instead of vpmuldq. For consistency, the patch
expands to xop_pmacsdql pattern, but gcc figures out that addition of 0 is
unneeded and substitutes MAC insn with plain MUL.

Attached patch does not even try to fix other intrinsics. Someone familiar with
AMD documentation should review all these, since the documentation (43479.pdf)
is somehow inconsistent (i.e. the figure that explains VPMADCSSWD is
inconsistent with the description).

Since I don't have XOP processor, I can only eyeball the asm, in this case:

        vpxor   %xmm3, %xmm3, %xmm3
        xorl    %eax, %eax
.L3:
        vpshufd $216, c2(%rax), %xmm1
        vpshufd $216, c3(%rax), %xmm0
        vpmuldq %xmm0, %xmm1, %xmm2
        vpmacsdqh       %xmm3, %xmm0, %xmm1, %xmm0
        vmovdqa %xmm2, e1(%rax,%rax)
        vmovdqa %xmm0, e1+16(%rax,%rax)
        addq    $16, %rax
        cmpq    $2048, %rax
        jne     .L3

Please also note hoisting of constant load.

Reply via email to