In cases where there isn't an equivalent function you would have a fallback 
that does it in non vectorized fashion. So if you used the gather instruction 
the SSE fallback would just loop over the elements of the simd vector and do 
them one by one.

No doubt you would not be able to write code once that _optimally_ uses all the 
available power of SSE2/3/4,AVX, AVX2, and AVX512 all in one go but I think 
many cases should be quite good. C# does something like this with JIT 
intrinsics for a subset of SIMD operations. It is very nice to use but 
unfortunately the subset of operations is a bit too small for many use cases.

Reply via email to