Hi Paul,

All looks fine, okay for trunk.  Thanks!

Just some possible improvements:

On Fri, Jul 19, 2019 at 10:18:47PM -0500, Paul Clarke wrote:
> +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))

Maybe all these terribly long lines would be better if they used a
macro?  Something defined in xmmintrin.h I guess, and just for the
attribute part?

> +_mm_blend_epi16 (__m128i __A, __m128i __B, const int __imm8)
> +{
> +  __v8hu __bitmask = vec_splats ((unsigned short) __imm8);
> +  const __v8hu __shifty = { 0, 1, 2, 3, 4, 5, 6, 7 };
> +  __bitmask = vec_sr (__bitmask, __shifty);
> +  const __v8hu __ones = vec_splats ((unsigned short) 0x0001);
> +  __bitmask = vec_and (__bitmask, __ones);
> +  const __v8hu __zero = {0};
> +  __bitmask = vec_sub (__zero, __bitmask);
> +  return (__m128i) vec_sel ((__v8hu) __A, (__v8hu) __B, __bitmask);
> +}

You can do a lot better than this, using vgbbd (that's vec_gb in
instrinsics).  It's probably nicest if you splat the __imm8 to all
bytes in a vector, then do the vgbbd, and then you can immediately
vec_sel with the result of that.

> +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
> +_mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask)
> +{
> +  const __v16qu __hibits = vec_splats ((unsigned char) 0x80);
> +  __v16qu __lmask = vec_and ((__v16qu) __mask, __hibits);
> +  const __v16qu __zero = {0};
> +  __lmask = (vector unsigned char) vec_cmpgt (__lmask, __zero);
> +  return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask);
> +}

Can you do this with just a vsrab / vec_sra?  Splat imm 7 to a vec,
sra by that?


Segher

Reply via email to