Hi Paul, All looks fine, okay for trunk. Thanks!
Just some possible improvements: On Fri, Jul 19, 2019 at 10:18:47PM -0500, Paul Clarke wrote: > +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, > __artificial__)) Maybe all these terribly long lines would be better if they used a macro? Something defined in xmmintrin.h I guess, and just for the attribute part? > +_mm_blend_epi16 (__m128i __A, __m128i __B, const int __imm8) > +{ > + __v8hu __bitmask = vec_splats ((unsigned short) __imm8); > + const __v8hu __shifty = { 0, 1, 2, 3, 4, 5, 6, 7 }; > + __bitmask = vec_sr (__bitmask, __shifty); > + const __v8hu __ones = vec_splats ((unsigned short) 0x0001); > + __bitmask = vec_and (__bitmask, __ones); > + const __v8hu __zero = {0}; > + __bitmask = vec_sub (__zero, __bitmask); > + return (__m128i) vec_sel ((__v8hu) __A, (__v8hu) __B, __bitmask); > +} You can do a lot better than this, using vgbbd (that's vec_gb in instrinsics). It's probably nicest if you splat the __imm8 to all bytes in a vector, then do the vgbbd, and then you can immediately vec_sel with the result of that. > +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, > __artificial__)) > +_mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask) > +{ > + const __v16qu __hibits = vec_splats ((unsigned char) 0x80); > + __v16qu __lmask = vec_and ((__v16qu) __mask, __hibits); > + const __v16qu __zero = {0}; > + __lmask = (vector unsigned char) vec_cmpgt (__lmask, __zero); > + return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask); > +} Can you do this with just a vsrab / vec_sra? Splat imm 7 to a vec, sra by that? Segher