On Thu, Sep 18, 2014 at 1:50 PM, Alan Lawrence <alan.lawre...@arm.com> wrote: > This fixes PR/61114 by redefining the REDUC_{MIN,MAX,PLUS}_EXPR tree codes. > > These are presently documented as producing a vector with the result in > element 0, and this is inconsistent with their use in tree-vect-loop.c > (which on bigendian targets pulls the bits out of the wrong end of the > vector result). This leads to bugs on bigendian targets - see also > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61114. > > I discounted "fixing" the vectorizer (to read from element 0) and then > making bigendian targets (whose architectural insn produces the result in > lane N-1) permute the result vector, as optimization of vectors in RTL seems > unlikely to remove such a permute and would lead to a performance > regression. > > Instead it seems more natural for the tree code to produce a scalar result > (producing a vector with the result in lane 0 has already caused confusion, > e.g. https://gcc.gnu.org/ml/gcc-patches/2012-10/msg01100.html). > > However, this patch preserves the meaning of the optab (producing a result > in lane 0 on little-endian architectures or N-1 on bigendian), thus > generally avoiding the need to change backends. Thus, expr.c extracts an > endianness-dependent element from the optab result to give the result > expected for the tree code. > > Previously posted as an RFC > https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00041.html , now with an extra > VIEW_CONVERT_EXPR if the types of the reduction/result do not match.
Huh. Does that ever happen? Please use a NOP_EXPR instead of a VIEW_CONVERT_EXPR. Ok with that change. Thanks, Richard. > Testing: > x86_86-none-linux-gnu: bootstrap, check-gcc, check-g++ > aarch64-none-linux-gnu: bootstrap > aarch64-none-elf: check-gcc, check-g++ > arm-none-eabi: check-gcc > > aarch64_be-none-elf: check-gcc, showing > FAIL->PASS: gcc.dg/vect/no-scevccp-outer-7.c execution test > FAIL->PASS: gcc.dg/vect/no-scevccp-outer-13.c execution test > Passes the (previously-failing) reduced testcase on > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61114 > > Have also assembler/stage-1 tested that testcase on PowerPC, also > fixed. > gcc/ChangeLog: > > * expr.c (expand_expr_real_2): For REDUC_{MIN,MAX,PLUS}_EXPR, add > extract_bit_field around optab result. > > * fold-const.c (fold_unary_loc): For REDUC_{MIN,MAX,PLUS}_EXPR, > produce > scalar not vector. > > * tree-cfg.c (verify_gimple_assign_unary): Check result vs operand > type > for REDUC_{MIN,MAX,PLUS}_EXPR. > > * tree-vect-loop.c (vect_analyze_loop): Update comment. > (vect_create_epilog_for_reduction): For direct vector reduction, use > result of tree code directly without extract_bit_field. > > * tree.def (REDUC_MAX_EXPR, REDUC_MIN_EXPR, REDUC_PLUS_EXPR): Update > comment.