In cases where there isn't an equivalent function you would have a fallback that does it in non vectorized fashion. So if you used the gather instruction the SSE fallback would just loop over the elements of the simd vector and do them one by one.
No doubt you would not be able to write code once that _optimally_ uses all the available power of SSE2/3/4,AVX, AVX2, and AVX512 all in one go but I think many cases should be quite good. C# does something like this with JIT intrinsics for a subset of SIMD operations. It is very nice to use but unfortunately the subset of operations is a bit too small for many use cases.
