This patch tries to deal with the "easy" part of a function ABI, the return value location, in vectorization costing. The testcase shows that if we vectorize the returned value but the function doesn't return in memory or in a vector register but as in this case in an integer register pair (reg:TI ax) (bah, ABI details exposed late? why's this not a parallel?) we end up spilling badly.
The idea is to account for such spilling so if vectorization benefits outweight the spilling we'll vectorize anyway. I think the particular testcase could be fixed in the subreg pass basically undoing the vectorization but I realize that generally this is a too hard problem and avoiding vectorization is better. Still this patch is somewhat fragile in that it depends on us "seeing" that the stored to decl is returned (see cfun_returns). Bootstrap & regtest running on x86_64-unknown-linux-gnu. I'd like to hear opinions on my use of hard_function_value and also from other target maintainers. I'm not sure we have sufficient testsuite coverage of _profitable_ vectorization of a return value. Feel free to add to this for your target. Ok for trunk? Thanks, Richard. 2018-01-30 Richard Biener <rguent...@suse.de> PR tree-optimization/84101 * tree-vect-stmts.c: Include explow.h for hard_function_value. (cfun_returns): New helper. (vect_model_store_cost): When vectorizing a store to a decl we return and the function ABI returns via a non-vector register account for the possible spilling that will happen. * gcc.target/i386/pr84101.c: New testcase. Index: gcc/tree-vect-stmts.c =================================================================== --- gcc/tree-vect-stmts.c (revision 257139) +++ gcc/tree-vect-stmts.c (working copy) @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3. #include "tree-cfg.h" #include "tree-ssa-loop-manip.h" #include "cfgloop.h" +#include "explow.h" #include "tree-ssa-loop.h" #include "tree-scalar-evolution.h" #include "tree-vectorizer.h" @@ -893,6 +894,22 @@ vect_model_promotion_demotion_cost (stmt "prologue_cost = %d .\n", inside_cost, prologue_cost); } +/* Returns true if the current function returns DECL. */ + +static bool +cfun_returns (tree decl) +{ + edge_iterator ei; + edge e; + FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) + { + greturn *ret = safe_dyn_cast <greturn *> (last_stmt (e->src)); + if (ret && gimple_return_retval (ret) == decl) + return true; + } + return false; +} + /* Function vect_model_store_cost Models cost for stores. In the case of grouped accesses, one access @@ -971,6 +988,36 @@ vect_model_store_cost (stmt_vec_info stm vec_to_scalar, stmt_info, 0, vect_body); } + /* When vectorizing an SLP store into the function result assign + a penalty if the function returns in a non-vector register. + In this case we assume we'll end up with having to spill the + vector result and do element loads. */ + if (slp_node) + { + tree base = get_base_address (dr->ref); + if (base + && (TREE_CODE (base) == RESULT_DECL + || (DECL_P (base) && cfun_returns (base)))) + { + rtx reg = hard_function_value (TREE_TYPE (base), cfun->decl, 0, 1); + if ((REG_P (reg) && ! VECTOR_MODE_P (GET_MODE (reg))) + || GET_CODE (reg) == PARALLEL + || GET_CODE (reg) == CONCAT) + { + /* Spill. */ + inside_cost += record_stmt_cost (body_cost_vec, + ncopies, vector_store, + stmt_info, 0, vect_body); + /* Element loads. */ + unsigned int assumed_nunits = vect_nunits_for_cost (vectype); + inside_cost += record_stmt_cost (body_cost_vec, + ncopies * assumed_nunits, + scalar_load, + stmt_info, 0, vect_body); + } + } + } + if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "vect_model_store_cost: inside_cost = %d, " Index: gcc/testsuite/gcc.target/i386/pr84101.c =================================================================== --- gcc/testsuite/gcc.target/i386/pr84101.c (nonexistent) +++ gcc/testsuite/gcc.target/i386/pr84101.c (working copy) @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-slp2-details" } */ + +typedef struct uint64_pair uint64_pair_t ; +struct uint64_pair +{ + unsigned long w0 ; + unsigned long w1 ; +} ; + +uint64_pair_t pair(int num) +{ + uint64_pair_t p ; + + p.w0 = num << 1 ; + p.w1 = num >> 1 ; + + return p ; +} + +/* { dg-final { scan-tree-dump-not "basic block vectorized" "slp2" } } */