+  return (__m128) __builtin_ia32_movss_mask ((__v4sf) __A, (__v4sf) __B,
+               (__v4sf) __W,
delena wrote:
> please try the following:
> if (__U)
>   return __builtin_shuffle(A, B, (0, 5, 6, 7)); // may be you need to swap A 
> and B 
>  return W;
> I know that the immediate code will be less optimal, but we can optimize it 
> later.
Any update on this? I currently have a patch (D24653) looking at removing the 
movss/movsd mask intrinsics as we should be able to do this with purely generic 
shuffles. I can help with the optimization if necessary.


