On 17/12/15 10:46, Richard Biener wrote:
On Thu, 17 Dec 2015, Alan Lawrence wrote:
On 16/12/15 15:01, Richard Biener wrote:
The following patch adds a heuristic to prefer store/load-lanes
over SLP when vectorizing. Compared to the variant attached to
the PR I made the STMT_VINFO_STRIDED_P behavior explicit (matching
what you've tested).
Not sure I follow this. Compared to the variant attached to the PR - we will
now attempt to use load-lanes, if (say) all of the loads are strided, even if
we know we don't support load-lanes (for any of them). That sounds the wrong
way around and I think rather different to what you proposed earlier? (At the
least, the debug message "can use load/store lanes" is potentially misleading,
that's not necessarily the case!)
Ah, indeed. Note that the whole thing is still guarded by the check
that we can use store-lanes for the store.
I can also do it the other way around (as previously proposed) which
would change outcome for slp-perm-11.c. That proposal would not reject
the SLP if there were any strided grouped loads involved.
Indeed; the STMT_VINFO_STRIDED_P || !vect_load_lanes_supported approach (as on
PR68707) vectorizes slp-perm-11.c with SLP, which works much better than the
!STMT_VINFO_STRIDED_P && !vect_load_lanes_supported, which tries to use st2 (and
only sort-of works - you get an st2 output, but no ld2, and lots of faff).
I think I move for the patch from PR68707, therefore. (Ramana - any thoughts?)
Btw, another option is to push the decision past full SLP analysis
and thus make the decision globally for all SLP instances - currently
SLP instances are cancelled one a one-by-one basis meaning we might
do SLP plus load/store-lanes in the same loop.
I don't see anything inherently wrong with doing both in the same loop. On
simple loops, I suspect we'll do better committing to one strategy or the other
(tho really it's only the VF required I think?), but then, on such simple loops,
there are probably not very many SLP instances!
Maybe we have to go all the way to implementing a better vectorization
cost hook just for the permutations - the SLP path in theory knows
exactly which ones it will generate.
Yes, I think this sounds like a good plan for GCC 7. It doesn't require
constructing an entire stmt (if you are concerned about the cost of that), and
on most targets, probably integrates fairly easily with the
expand_vec_perm_const hooks.
--Alan