On 01/30/2018 02:59 AM, Richard Biener wrote:
> This patch tries to deal with the "easy" part of a function ABI,
> the return value location, in vectorization costing. The testcase
> shows that if we vectorize the returned value but the function
> doesn't return in memory or in a vector register but as in this
> case in an integer register pair (reg:TI ax) (bah, ABI details
> exposed late? why's this not a parallel?) we end up spilling
PARALLEL is used when the ABI mandates a value be returned in multiple
places. Typically that happens when the value is returned in different
types of registers (integer, floating point, vector).
Presumably it's not a PARALLEL in this case because the value is only
returned in %eax.
> The idea is to account for such spilling so if vectorization
> benefits outweight the spilling we'll vectorize anyway.
That's a pretty serious bleed of the target into the vectorizer. But
we've already deemed that the vectorizer is going to have these target
dependencies. So I won't object on those grounds.
> I think the particular testcase could be fixed in the subreg
> pass basically undoing the vectorization but I realize that
> generally this is a too hard problem and avoiding vectorization
> is better. Still this patch is somewhat fragile in that it
> depends on us "seeing" that the stored to decl is returned
> (see cfun_returns).
> Bootstrap & regtest running on x86_64-unknown-linux-gnu.
> I'd like to hear opinions on my use of hard_function_value
> and also from other target maintainers. I'm not sure we
> have sufficient testsuite coverage of _profitable_ vectorization
> of a return value. Feel free to add to this for your
Well, it's the right way to get at the information. I'm not aware of
any other way to get what you want. We could possibly hide the RTL bits
to avoid GET_MODE and friends within the vectorizer -- your call.
I'm not sure the bits in vect_mode_store_cost are right though. ISTM
you want to penalize if and only if the return value is not stored in a
vector-capable location. If it's a PARALLEL and any element is a
suitable vector register, then do not penalize -- the spills are
unavoidable in that case.
So I think you have to iterate over elements in the PARALLEL case to
verify none of them are suitable for holding the vector result.
I'm not entirely sure what to do with CONCAT. I wasn't immediately
aware it could show up in that context.
Or am I missing something here?
> Ok for trunk?
> 2018-01-30 Richard Biener <rguent...@suse.de>
> PR tree-optimization/84101
> * tree-vect-stmts.c: Include explow.h for hard_function_value.
> (cfun_returns): New helper.
> (vect_model_store_cost): When vectorizing a store to a decl
> we return and the function ABI returns via a non-vector
> register account for the possible spilling that will happen.
> * gcc.target/i386/pr84101.c: New testcase.