On Thu, 17 Dec 2015, Alan Lawrence wrote:

> On 16/12/15 15:01, Richard Biener wrote:
> > 
> > The following patch adds a heuristic to prefer store/load-lanes
> > over SLP when vectorizing.  Compared to the variant attached to
> > the PR I made the STMT_VINFO_STRIDED_P behavior explicit (matching
> > what you've tested).
> 
> Not sure I follow this. Compared to the variant attached to the PR - we will
> now attempt to use load-lanes, if (say) all of the loads are strided, even if
> we know we don't support load-lanes (for any of them). That sounds the wrong
> way around and I think rather different to what you proposed earlier? (At the
> least, the debug message "can use load/store lanes" is potentially misleading,
> that's not necessarily the case!)

Ah, indeed.  Note that the whole thing is still guarded by the check
that we can use store-lanes for the store.

I can also do it the other way around (as previously proposed) which
would change outcome for slp-perm-11.c.  That proposal would not reject
the SLP if there were any strided grouped loads involved.

> There are arguments that we want to do less SLP, generally, on ARM/AArch64 but
> I think Wilco's permute cost patch
> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01469.html is a better way of
> achieving that?

Maybe, but it's also a heuristic.  At least if we _statically_ fail to 
SLP due to cost issues then we re-try with interleaving.

> Just my gut feeling at this point - I haven't evaluated this version of the
> patch on any benchmarks etc...

Btw, another option is to push the decision past full SLP analysis
and thus make the decision globally for all SLP instances - currently
SLP instances are cancelled one a one-by-one basis meaning we might
do SLP plus load/store-lanes in the same loop.

Maybe we have to go all the way to implementing a better vectorization
cost hook just for the permutations - the SLP path in theory knows
exactly which ones it will generate.

Richard.

Reply via email to