Re: [PATCH] Fix PR68707, 67323

Alan Lawrence Thu, 17 Dec 2015 07:08:52 -0800

On 17/12/15 10:46, Richard Biener wrote:

On Thu, 17 Dec 2015, Alan Lawrence wrote:

On 16/12/15 15:01, Richard Biener wrote:


The following patch adds a heuristic to prefer store/load-lanes
over SLP when vectorizing.  Compared to the variant attached to
the PR I made the STMT_VINFO_STRIDED_P behavior explicit (matching
what you've tested).


Not sure I follow this. Compared to the variant attached to the PR - we will
now attempt to use load-lanes, if (say) all of the loads are strided, even if
we know we don't support load-lanes (for any of them). That sounds the wrong
way around and I think rather different to what you proposed earlier? (At the
least, the debug message "can use load/store lanes" is potentially misleading,
that's not necessarily the case!)


Ah, indeed.  Note that the whole thing is still guarded by the check
that we can use store-lanes for the store.

I can also do it the other way around (as previously proposed) which
would change outcome for slp-perm-11.c.  That proposal would not reject
the SLP if there were any strided grouped loads involved.

Indeed; the STMT_VINFO_STRIDED_P || !vect_load_lanes_supported approach (as onPR68707) vectorizes slp-perm-11.c with SLP, which works much better than the!STMT_VINFO_STRIDED_P && !vect_load_lanes_supported, which tries to use st2 (andonly sort-of works - you get an st2 output, but no ld2, and lots of faff).


I think I move for the patch from PR68707, therefore. (Ramana - any thoughts?)

Btw, another option is to push the decision past full SLP analysis
and thus make the decision globally for all SLP instances - currently
SLP instances are cancelled one a one-by-one basis meaning we might
do SLP plus load/store-lanes in the same loop.

I don't see anything inherently wrong with doing both in the same loop. Onsimple loops, I suspect we'll do better committing to one strategy or the other(tho really it's only the VF required I think?), but then, on such simple loops,there are probably not very many SLP instances!

Maybe we have to go all the way to implementing a better vectorization
cost hook just for the permutations - the SLP path in theory knows
exactly which ones it will generate.

Yes, I think this sounds like a good plan for GCC 7. It doesn't requireconstructing an entire stmt (if you are concerned about the cost of that), andon most targets, probably integrates fairly easily with theexpand_vec_perm_const hooks.


--Alan

Re: [PATCH] Fix PR68707, 67323

Reply via email to