On Sat, Jun 28, 2014 at 6:53 PM, Marc Glisse <marc.gli...@inria.fr> wrote: > There is always a risk, but then even with builtins I think there was a > small risk that an RTL optimization would mess things up. It is indeed > higher if we expose the operation to the optimizers earlier, but it would be > a bug if an "optimization" replaced a vector operation by something worse. > Also, I am only proposing to handle the most trivial operations this way, > not more complicated ones (like v[0]+=s) where we would be likely to fail > generating the right instruction. And the pragma should ensure that the > function will always be compiled in a mode where the vector instruction is > available. > > ARM did the same and I don't think I have seen a bug reporting a regression > about it (I haven't really looked though).
I think the Arm definitions come from a different angle. It's new, there is no assumed semantics. For the x86 intrinsics Intel defines that _mm_xxx() generates one of a given opcodes if there is a match. If I want to generate a specific code sequence I use the intrinsics. Otherwise I could already today use the vector type semantics myself. Don't get me wrong, I like the idea to have the optimization of the intrinsics happening. But perhaps not unconditionally or at least not without preventing them. I know this will look ugly, but how about a macro __GCC_X86_HONOR_INTRINSICS to enable the current code and have by default your proposed use of the vector arithmetic in place? This wouldn't allow removing support for the built-ins but it would also open the door to some more risky optimizations to be enabled by default.