Should code that is directly using the builtins themselves (like __builtin_ia32_pblendw256) be optimized too? If so wouldn't it be better to, for example, leave _mm256_blend_epi16 as is, remove __builtin_ia32_pblendw256 from BuiltinsX86.def and make it a #define to shufflevector?
http://reviews.llvm.org/D3601 _______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
