On Tue, May 19, 2020 at 10:48 AM Richard Biener <rguent...@suse.de> wrote:
>
> On Tue, 19 May 2020, Uros Bizjak wrote:
>
> > Hello!
> >
> > Attached patch adds missing vector zero/sign_extend expanders to allow
> > vectorization of operations between different vector sizes.
> >
> > The patch regresses (progresses?):
> >
> > FAIL: gcc.target/i386/pr92645-4.c scan-tree-dump-times optimized
> > "vec_unpack_lo" 3
> >
> > but eyeballing the asm code before/after the patch, we get much better:
> >
> >  .L3:
> > -       vmovdqu (%rsi,%rax), %xmm6
> > -       vpxor   %xmm5, %xmm5, %xmm5
> > -       vmovdqa %ymm5, -32(%rsp)
> > -       vmovdqa %xmm6, -32(%rsp)
> > -       vpmovzxbw       -32(%rsp), %ymm0
> > +       vpmovzxbw       (%rsi,%rax), %ymm0
> >         vpmullw %ymm4, %ymm0, %ymm0
> >         vpaddw  %ymm2, %ymm0, %ymm0
> >         vpsrlw  $8, %ymm0, %ymm0
> >
> > and even more differences to a much better code in the loop prologue.
> >
> > (Please note a strange double-save to a stack slot in the old code).
> >
> > Richi, I guess that the testcase you introduced needs some adjustment.
>
> I will deal with the FAIL once you commit the patch, the testcase
> is for forwprop code which indeed also knows how to exercise those
> missing patterns.  IIRC I filed the PR when working on those
> (and may in turn remove the VEC_UNPACK_* support from forwprop again!)
>
> > As discussed in the PR, there are a couple of XFAILs, where the
> > compiler is not able to vectorize the code. The named expanders are
> > there, but for the reason, explained in PR comment #8, middle-end
> > doesn't exercise them.
>
> OK, so we should track this in a separate PR?  Can you point to
> the specific expander and the XFAILed testcases there?

Yes, I'll open a new PR and document the current limitation. I will CC
you on the PR.

Thanks,
Uros.

Reply via email to