On Thu, 17 Dec 2015, Alan Lawrence wrote: > On 16/12/15 15:01, Richard Biener wrote: > > > > The following patch adds a heuristic to prefer store/load-lanes > > over SLP when vectorizing. Compared to the variant attached to > > the PR I made the STMT_VINFO_STRIDED_P behavior explicit (matching > > what you've tested). > > Not sure I follow this. Compared to the variant attached to the PR - we will > now attempt to use load-lanes, if (say) all of the loads are strided, even if > we know we don't support load-lanes (for any of them). That sounds the wrong > way around and I think rather different to what you proposed earlier? (At the > least, the debug message "can use load/store lanes" is potentially misleading, > that's not necessarily the case!)
Ah, indeed. Note that the whole thing is still guarded by the check that we can use store-lanes for the store. I can also do it the other way around (as previously proposed) which would change outcome for slp-perm-11.c. That proposal would not reject the SLP if there were any strided grouped loads involved. > There are arguments that we want to do less SLP, generally, on ARM/AArch64 but > I think Wilco's permute cost patch > https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01469.html is a better way of > achieving that? Maybe, but it's also a heuristic. At least if we _statically_ fail to SLP due to cost issues then we re-try with interleaving. > Just my gut feeling at this point - I haven't evaluated this version of the > patch on any benchmarks etc... Btw, another option is to push the decision past full SLP analysis and thus make the decision globally for all SLP instances - currently SLP instances are cancelled one a one-by-one basis meaning we might do SLP plus load/store-lanes in the same loop. Maybe we have to go all the way to implementing a better vectorization cost hook just for the permutations - the SLP path in theory knows exactly which ones it will generate. Richard.