On Wed, Oct 12, 2016 at 10:29 AM, Bin.Cheng <amker.ch...@gmail.com> wrote: > On Wed, Oct 12, 2016 at 9:12 AM, Richard Biener > <richard.guent...@gmail.com> wrote: >> On Tue, Oct 11, 2016 at 5:03 PM, Bin Cheng <bin.ch...@arm.com> wrote: >>> Hi, >>> Given below test case, >>> int foo (unsigned short a[], unsigned int x) >>> { >>> unsigned int i; >>> for (i = 0; i < 1000; i++) >>> { >>> x = a[i]; >>> a[i] = (unsigned short)(x >= 32768 ? x - 32768 : 0); >>> } >>> return x; >>> } >>> >>> it now can be vectorized on AArch64, but generated assembly is way from >>> optimal: >>> .L4: >>> ldr q4, [x3, x1] >>> add w2, w2, 1 >>> cmp w2, w0 >>> ushll v1.4s, v4.4h, 0 >>> ushll2 v0.4s, v4.8h, 0 >>> add v3.4s, v1.4s, v6.4s >>> add v2.4s, v0.4s, v6.4s >>> cmhi v1.4s, v1.4s, v5.4s >>> cmhi v0.4s, v0.4s, v5.4s >>> and v1.16b, v3.16b, v1.16b >>> and v0.16b, v2.16b, v0.16b >>> xtn v2.4h, v1.4s >>> xtn2 v2.8h, v0.4s >>> str q2, [x3, x1] >>> add x1, x1, 16 >>> bcc .L4 >>> >>> The vectorized loop has 15 instructions, which can be greatly simplified by >>> turning cond_expr into max_expr, as below: >>> .L4: >>> ldr q1, [x3, x1] >>> add w2, w2, 1 >>> cmp w2, w0 >>> umax v0.8h, v1.8h, v2.8h >>> add v0.8h, v0.8h, v2.8h >>> str q0, [x3, x1] >>> add x1, x1, 16 >>> bcc .L4 >>> >>> This patch addresses the issue by adding new vectorization pattern. >>> Bootstrap and test on x86_64 and AArch64. Is it OK? >> >> So the COND_EXPRs are generated this way by if-conversion, right? I > Though ?: is used in source code, yes, it is if-conv regenerated COND_EXPR. >> believe that >> the MAX/MIN_EXPR form is always preferrable and thus it looks like >> if-conversion >> might want to either directly generate it or make sure to fold the >> introduced stmts >> (and have a match.pd pattern catching this). > Hmm, I also noticed saturation cases which should be better > transformed before vectorization in scalar optimizers. But this case > is a bit different because there is additional computation involved > other than type conversion. We need to prove the computation can be > done in either large or small types. It is quite specific case and I > don't see good (general) solution in if-conv. Vect-pattern looks like > a natural place doing this. I am also looking at general saturation > cases, but this one is different?
(vect-patterns should go away ...) But as if-conversion results may also prevail for scalar code doing the pattern in match.pd would be better - that is, "apply" the pattern already during if-conversion. Yes, if-conversion fails to fold the stmts it generates (it only uses generic folding on the trees it builds - it can need some TLC here). Richard. > Thanks, > bin >> >> Richard. >> >>> Thanks, >>> bin >>> >>> 2016-10-11 Bin Cheng <bin.ch...@arm.com> >>> >>> * tree-vect-patterns.c (vect_recog_min_max_modify_pattern): New. >>> (vect_vect_recog_func_ptrs): New element for above pattern. >>> * tree-vectorizer.h (NUM_PATTERNS): Increase by 1. >>> >>> gcc/testsuite/ChangeLog >>> 2016-10-11 Bin Cheng <bin.ch...@arm.com> >>> >>> * gcc.dg/vect/vect-umax-modify-pattern.c: New test. >>> * gcc.dg/vect/vect-umin-modify-pattern.c: New test.