https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123163
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Manuel López-Ibáñez from comment #2) > Sure, I see why it is not profitable with N=16. > > But why does it decide to vectorize with N=32? If it does become profitable > with N=32, then how can I tell GCC to also assume that n >= 32 in bar? If it > is not profitable, then vectorizing it sounds like a bug. This is not what happens - with N == 32 we no longer unroll the loops at line 31, inlined 33 and 34 and do not elide the vec[] temporary array. The operations on that temporary array are profitable (even with N == 16 if you forcefull disable unrolling). But I doubt this is overall a win over eliding the temporary array, aka calling bar () directly. That said, I'm not sure what's the vector code you expect GCC to generate and why you think that's going to be faster than scalar code. I suggest you play with using vector intrinsics - if you arrive at profitable vectorization of bar() I'd be curious to know.
